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bound  was  performed;  2)  Development  pf  nonlinear  forward  and  inverse 
models  in  the  visible  through  thermal  IR  region;  3)  The  development  of 
a  linear  inverter  using  all  frequencies;  and  4)  The  development  of 
nonparametric  decision  trees  using  visible  through  -thermal  IR  to  discri¬ 
minate  clear  columns  over  loam,  clear  columns  over  snow,  water  clouds 
and  ice  clouds. 
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"'^.Results  indicate  that  the  iterated' extended  Kalman  filter  is  a 
robust  method  of  cloud  parameter  estimation",? that  while  some  parameters 
such  as  cloud  top  height  and  liquid  water  content  are  well  estimated, 
others  such  as  cloud  thickness  are  not.  Finally,  simple  decision  trees 
may  be  formulated  to  distinguish  the  scenes  analyzed. 
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I.  INTRODUCTION 


The  overall  purpose  of  this  study  has  been  the  development  of 
methods  to  improve  the  identification  of  cloud  features  on  a  global 
basis  through  the  interpretation  of  multi  spectral  measurements  obtained 
from  satellite-borne  passive  spectrometers.  For  many  data  sparse  areas, 
surface  observations  of  cloud  conditions  are  unavailable  and  cloud  fea¬ 
tures  can  only  be  determined  from  satellite  sensors.  Inherent  limita¬ 
tions  in  surface  observations,  such  as  the  inability  to  detect  clouds 
above  an  overcast  layer,  may  be  also  resolved  from  space.  Finally, 
requirements  for  data  such  as  cloud  heights,  areal  extent,  cloud  phase, 
rainfall  and  integrated  water  content  makes  desirable  the  use  of  all 
available  information  provided  by  surface  and  satellite  observations. 

The  specific  parameters  of  interest  include: 

1)  Cloud  State 

A)  Glaciated  cloud  top  and/or  presence  of  glaciated  cloud 

B)  Total  cloud  water  content 

C)  Presence  of  rain  and/or  rainfall  rate 

D)  Cloud  height/thickness 

2)  Surface  State 

A)  Foam  coverage/wind  speed  over  ocean 

B)  Snow/ice  coverage 

3)  Integrated  Atmospheric  Water  Vapor 
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A  specific  objective  of  this  study  has  been  to  develop  tech¬ 
niques  which  are  suitable  for  inclusion  in  the  operational  Air  Force 
3-D  Nephanalysis  Program  (3DNEPH),  in  terms  of  its  cloud  identification 
capability.  A  unifying  statistical  approach  called  Bayesian  Decision 
Analysis  is  presented  and  its  potential  for  application  to  multispectral 
cloud  identification  is  shown. 

1 .1  Air  Force  Motivation 

The  Air  Force  at  present  lacks  a  capability  for  obtaining 
detailed  specialized  weather  intelligence  over  many  areas  of  the  world. 
This  intelligence  is  specially  needed  over  denied  areas  of  the  battle¬ 
field  and  the  data  are  also  required  by  the  Air  Force  Global  Weather 
Central  (AFGWC)  In  order  to  provide  adequate  meteorological  support  to 
many  customers. 

Experience  in  conflicts  such  as  Vietnam  has  demonstrated  the 
need  for  detailed  cloud  ceiling  and  visibility  information  over  a  target. 
The  advantages  of  precision-guided  munitions  over  conventional  bombing 
dictate  that  precision-guided  munitions  be  used  whenever  possible. 
Precision-guided  munitions,  however,  require  high  cloud  ceilings  and 
good  visibility  to  ensure  that  the  pilots  will  be  able  to  identify  the 
target.  Radiometric  satellite  sensors  are  already  providing  some  cloud 
information,  but  proper  processing  is  still  badly  needed  in  order  to 
obtain  better  definition  of  cloud  layering,  cloud  cover  and  cloud  top 
heights  over  tactical-sized  targets. 
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Due  to  the  great  climatic  extremes  that  may  be  encountered  In 
modern  warfare,  weather  observations  of  specialized  cloud  parameters 
will  be  required  under  difficult  observing  conditions.  The  difference 
between  successful  and  unsuccessful  delivery  of  an  intercontinental 
missile,  for  instance,  may  hinge  on  whether  it  is  known  that  the  reentry 
will  take  place  through  an  all  ice  or  an  all  water  cloud.  An  adequate 
capability  for  processing  presently  existing  and  planned  microwave 
sensor  data  will  enhance  the  likelihood  that  such  discrimination  will 
be  available  when  needed. 

In  order  to  achieve  a  solution  to  the  total  Air  Force  problem, 
effort  will  be  required  in  two  major  areas.  New  and  improved  sensors 
must  be  developed  in  order  to  completely  view  those  parts  of  the  spectrum 
containing  meteorologically  significant  information.  The  processing 
techniques  for  satellite  data  must  be  improved.  Current  techniques  are 
often  suboptimum  in  their  retrieval  capacity  and  the  total  volume  of 
data  taxes  present  processing  resources. 

1 .2  Meteorological  Satellite  Sensors 

1.2.1  Operational  Satellite  Systems 

Present  operational  meteorological  satellites  include  the  DMSP, 

NOAA,  and  GOES  systems.  The  satellites  all  carry  visible  and  thermal- 
infrared  radiometers.  The  two-channel  imaging  radiometers  are  designed 
to  make  observations  of  clouds  and  of  the  earth's  surface.  The  visible 
channels  are  used  to  observe  cloud  distribution,  cloud  type,  and  snow 

1 
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and  ice  distribution.  The  thermal  infrared  channels  are  used  to  measure 
cloud  top  temperatures  and  surface  temperatures.  Based  on  the  measured 
thermal  patterns,  it  is  also  possible  to  determine  cloud  distribution, 
high  cloud  types,  and  sea  ice  distribution;  with  corrections  for  atmos¬ 
pheric  effects,  sea  surface  temperatures  can  be  measured  directly  using 
the  thermal  infrared. 

The  characteristics  of  each  of  the  sensor  systems  are  described 
in  the  following  sections. 

DMSP.  The  Department  of  Defense's  Defense  Meteorological 
Satellite  Program  (DMSP)  consists  of  sun  synchronous  satellites  in  a 
polar  orbit  450  nautical  miles  (830  km)  above  the  earth.  In  the  oper¬ 
ational  mode,  two  DMSP  satellites  provide  imagery  every  six  hours  over 
any  spot  on  earth  (sunrise-sunset,  noon-midnight).  This  imagery  is  in 
the  visible/near  infrared  and  thermal  infrared  spectral  intervals  over 
a  1600  nm  (3200  km)  swath  below  the  satellite. 

Real  time  meteorological  data  within  the  acquisition  range, 
approximately  a  radius  of  1400  nm  (2800  km)  of  the  receiving  station, 
is  provided  to  the  D0D  tactical  sites.  These  sites  are  scattered  around 
the  globe  and  onboard  U.S.  Navy  aircraft  carriers.  The  Air  Force  Global 
Weather  Central  (AFGWC)  at  Offutt  Air  Force  Base,  Nebraska,  receives 
stored  data  of  global  coverage  from  two  command  readout  sites. 
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The  present  capability  of  the  DMSP  includes  the  following: 


DATA  TYPE 

SPECTRAL  INTERVAL 

RESOLUTION 

Light  Fine  (LF) 

.4  to  1.1  pm 

0.6  km 

Thermal  Fine  (TF) 

8  to  13  pm 

0.6  km 

Light  Smoothed  (LS)* 

.4  to  1.1  pm 

3  km 

Thermal  Smoothed  (TS) 

8  to  13  pm 

3  km 

*LS  data  has  a  low  light 

capability  that  "sees 

at  night. 

NOAA.  The  NOAA  series 

of  satellites  carry 

the  Scanning  Radio- 

meter  (SR)  and  the  Very  High  Resolution  Radiometer  (VHRR) ,  Both  the 

VHRR  and  SR  data  have  been  available  twice-per-day  from  the  polar-orbiting 

NOAA  satellite  (one  daytime  and 

one  nighttime  pass) 

since  the  first  of 

the  series  was  placed  into  operation  in  early  1973. 

teri sties  are  summarized  below: 

The  sensor  charac- 

SENSOR 

SPECTRAL  INTERVAL 

RES0LUTI0N 

VHRR-Visible 

0 , 5  -  0 . 7  pm 

0.9  km 

VHRR-Thermal  Infrared 

10,5-12,5  pm 

0.9  km 

SR-Visible 

0,5  -  0.7  pm 

4  km 

SR-Thermal  Infrared 

10.5-12.5  pm 

4  km 
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GOES.  The  third  type  of  operational  meteorological  satellite 
data  is  from  the  GOES  (Geostationary  Operational  Environmental  Satellite). 
GOES-EAST  launched  in  1974,  is  stationed  over  the  equator  in  an  earth 
synchronous  orbit  in  a  position  to  view  the  eastern  United  States  and 
Atlantic  Ocean.  GOES-WEST  is  in  a  position  to  view  the  western  United 
States  and  eastern  Pacific.  Because  of  its  earth  synchronous  orbit, 

GOES  provides  hemispheric  coverage  on  an  essentially  continuous  basis, 
with  data  being  collected  every  one-half  hour.  Using  the  sequential 
coverage,  cloud  motions  can  be  tracked,  and  the  wind  flow  at  cloud  level 
can  be  derived.  The  characteristics  of  the  Visual  and  Infrared  Spin 
Scan  Radiometer  (VISSR)  are  as  follows: 

SENSOR  SPECTRAL  INTERVAL  RESOLUTION 

VISSR-Vi si bl e  0,55  -  0,75  pm  1  km 

VISSR-Thermal  Infrared  10.5  -  12.5  pm  8  km 


1.2.2  Experimental  Satellite  Systems 

The  principal  experimental  meteorological  satellites  have  been 
those  of  the  NASA  Nimbus  series.  These  spacecrafts  have  carried  many 
passive  remote  imagers  including  the  THIR  (Temperature  Humidity  Infrared 
Radiometer)  infrared  imager  and  the  ESMR  (Electrically  Scanning  Microwave 
Radiometer)  microwave  imager.  The  first  Nimbus  was  launched  in  1964; 
Nimbus-7  is  scheduled  for  late  1978.  The  THIR  has  two  channels,  one  in 
the  6. 3-7. 2  pm  interval  and  one  in  the  10,5-12,5  pm  window.  The  6.7  pm 
channel  has  application  for  detecting  atmospheric  water  vapor, 

_ ..  ■  . . _ _ . _ _ 
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The  initial  microwave  sensor  was  the  ESMR  (Electrically  Scanning 
Microwave  Radiometer)  flown  on  Nimbus-5.  The  ESMR-5  has  a  frequency  of 
19,35  GHz  and  horizontal  polarization  whereas  the  ESMR-6  (Nimbus-6)  has 
a  frequency  of  37.0  GHz  and  both  horizontal  and  vertical  polarizations; 
the  maximum  resolution  of  ESMR  is  of  the  order  of  25  km.  ESMR  provides 
the  capability  of  viewing  the  earth's  surface  through  clouds  and  of 
mapping  quantitative  rainfall  rates  over  oceans.  Of  the  two  frequencies, 
the  37.0  GHz  instrument  is  more  sensitive  to  detecting  liquid  clouds. 

The  37.0  GHz  radiometer  is  also  useful  for  detecting  sea  state  and  for 
mapping  sea  ice  boundaries;  it  is  also  possible  to  distinguish  new  and 
old  sea  ice  in  the  microwave  data. 

Although  the  ESMR-5  and  ESMR-6  sensors  have  been  single  fre¬ 
quency  radiometers,  studies  have  indicated  that  a  combination  of  fre¬ 
quencies,  such  as  19.35,  37.0,  and  94.0  GHz,  would  likely  provide 
considerably  more  information  than  either  frequency  alone.  For  example, 
clouds  can  be  detected  at  94  GHz,  light  rain  at  37  GHz,  and  moderate 
rain  at  19  GHz.  By  using  a  combination  of  frequencies,  therefore,  it 
should  be  possible  to  measure  rainfall  rates  more  accurately. 

Although  not  designed  for  meteorological  applications,  Landsat 
and  Skylab  have  also  provided  useful  cloud  information,  both  with  regard 
to  cloud  structure  and  with  regard  to  distinguishing  cloud  from  the 
underlying  surface.  The  Landsat  Multispectral  Scanner  (MSS)  has  four 
channels  in  the  visible  and  extending  into  the  near- infrared;  the  spectral 
intervals  are;  0.5-0. 6  urn,  0.6-0. 7  ym,  0.7-0. 8  ym,  and  0.8-1 .1  ym. 
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The  MSS  has  a  resolution  of  70-100  m.  Although  frequent  repetitive 
coverage  with  Landsat  is  not  possible,  Landsat  images  have  been  found 
to  be  useful  for  studying  detailed  mesoscale  cloud  structure. 

The  Skylab  missions  were  only  for  relatively  short  time  periods, 
the  longest  being  the  85  day  Skylab-4  mission.  Of  particular  interest, 
however,  are  the  data  from  the  13-channel  Si 92  Mul ti spectral  Scanner, 
which  is  the  only  spacecraft  instrument  that  has  provided  measurements 
in  several  near-infrared  bands  out  to  2  ym.  Using  the  Skylab  S192  data, 
studies  have  shown  that  the  reflectance  of  snow  surfaces  drops  drama¬ 
tically  in  the  near-infrared,  such  as  in  the  spectral  interval  of 
1.55-1.75  urn.  The  reflectance  of  water  clouds,  however,  remains  high 
in  this  same  band.  Therefore,  it  appears  that  an  operational  sensor  in 
the  near-infrared  could  be  useful  for  distinguishing  automatically  between 
clouds  and  underlying  snowcover.  Moreover,  aircraft  measurements  in 
the  near-infrared  have  indicated  the  potential  for  distinguishing  water 
droplet  clouds  from  ice  crystal  clouds. 
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1.2.3  Summary  of  Applications  of  Present  and  Potential  Sensors 


The  applications  of  sensors  on  present  satellites  and  potential 
sensors  are  summarized  below. 


Visible 


0.4- 1.0  pm  cloud  distribution,  cloud  type, 
snow/ice  distribution 

0.55-0.90  pm  cloud  distribution,  cloud  type, 
snow/ice  distribution 

0.725-1.0  pm  (red-near  infrared)  cloud  distribution, 
cloud  type,  characteristics  of  snow/ice 
associated  with  melting 

Near-IR 


1.6  pm  distinguish  clouds  from  underlying 

snow/ice;  distinguish  water  droplet 
clouds  from  ice  crystal  clouds 


2.1  pm  distinguish  clouds  from  underlying 

snow/ice;  distinguish  water  droplet 
clouds  from  ice  crystal  clouds 

3.8  pm  (near-IR/window)  measure  cloud-top  and 

surface  temperatures  during  nighttime 


Thermal -IR 

6.7  pm  estimate  atmospheric  water  vapor 


8.0-13.0  pm  map  cloud  distributions,  certain  cloud 
types,  ice  distributions,  estimate 
cloud  temperatures 


10.5- 11.5  pm  map  cloud  distributions,  certain  cloud 

11.5- 12,5  pm  types,  ice  distributions,  estimate  cloud 

top  temperatures,  sea  surface  temperatures 


Microwave 


10.7  GHz 


19  GHz 


37  GHz 


94  GHz 


estimate  rainfall  over  ocean,  detect 
sea  state,  determine  snow  and  ice 
morphology 

view  earth's  surface  through  clouds, 
detect  sea  state,  measure  rainfall 
over  ocean,  detect  soil  moisture, 
detect  ice/water  contrast 

view  earth's  surface  through  some  clouds, 
detect  sea  state,  map  ice  boundaries, 
distinguish  new  and  old  ice,  estimate 
light  rainfall  and  some  cloudiness  over 
ocean,  detect  soil  moisture 

sense  clouds  over  ocean,  detect  surface 
characteristics  such  as  soil  moisture, 
snow  and  ice,  sea  state,  possibly  detect 
rainfall  over  land 


Visible  and  near-infrared  sensors  detect  sunlight  (or  in 
some  instances,  moonlight)  reflected  from  clouds  and  surfaces.  Thermal - 
IR  and  microwave  sensors  detect  emitted  energy  from  clouds  and  surfaces. 
An  exception  is  the  3.8  pm  window  which  sees  significant  quantities  of 
both  solar  and  terrestrial  energy  during  daytime. 


1 .3  Three  Dimensional  Nephanalysis 

In  addition  to  the  generation  of  weather  imagery,  data  from 
the  Defense  Meteorological  Satellite  Program  satellites  are  input  to 
a  nephanalysis  program  (3DNEPH)  which  ultimately  provides  cloud  height, 
coverage  and  type  information  to  eventual  users.  Although  the  original 
documentation  of  the  program  (Coburn,  1971)  implies  a  secondary  role  for 


the  satellite  data  with  respect  to  its  use  in  conjunction  with  surface, 
aircraft  and  other  reports,  the  present  operation  considers  the  satellite 
data  to  be  fundamental  (Fye,  1978),  Separate  analyses  are  performed  on 
data  streams  from  the  IR  and  visible  spectral  regions,  to  determine  cloud 
coverage.  Cloud  heights  are  derived  from  the  IR  data  alone,  whereas 
cloud  type  estimates  depend  on  visible,  IR,  or  both  forms  of  data  depending 
on  what  is  available. 

One  feature  of  analysis  schemes  used  within  the  3DNEPH  program 
is  the  absence  of  explicit  dependence  on  spatial  information  and  reliance 
on  computationally  efficient  procedures  using  first  order  statistics. 
Examples  of  these  statistics  are  mean  brightness  and  variability, 
defined  as: 


-  1  N 

mean  brightness,  B  =  ^  l  B. 


(1.3.1) 


*|  *1  N  _ 

and  variability,  V  =  j  +  1 


(1.3.2) 


Because  the  data  as  used  represent  an  8x8  square  array  of  25  nm  on  a 
side,  N  = 64  and  Bi  represents  data  with  a  roughly  3  nm  inter-sample 
interval.  Infrared  data  are  used  to  construct  a  histogram  which  generally 
is  Dol.ymodal  (usually  one,  two  or  three  "distinct"  modes).  Cold  modes 
are  related  to  cloud  layers.  The  warmest  mode  may  be  assigned  to  the 
ground. 


In  the  simplest  terms,  the  success  of  3DNEPH  has  depended  on 
the  validity  of  two  general  rules  and  their  converses: 

1.  Clouds  have  a  high  albedo  (and  the  implied  converse 
that  a  high  albedo  denotes  a  cloud);  and 

2.  High  clouds  are  colder  than  low  clouds  or  ground 
(and  the  implied  converse  that  a  cold  area  denotes 
a  high  cloud). 

However,  these  rules  are  not  valid  in  a  variety  of  special 
circumstances .  Ambiguities  arise,  for  example,  when  they  are  applied 
in  estimating  cloud  amount  over  highly  reflective  coastlines,  snow  areas, 
deserts  and  mountains,  and  over  cold  regions  such  as  the  poles.  The 
AFGWC  has  developed  extensive  processing  to  provide  background  bright¬ 
nesses  and  temperatures.  Bivariate  analysis  of  visible  and  IR  data 
resolves  some  ambiguities  and  is  used  for  cloud  typing.  A  good  summary 
of  the  joint  IR  and  visible  characteristics  of  data  from  meteorological 
satellites  is  given  in  Anderson,  et  al.  (1974). 

The  cloud  field,  described  by  the  output  variables  of  total 
cover,  layered  cloud  amount,  and  cloud  type,  represents  a  mixture  of 
classification  data  (cloud  type),  and  estimation  (total  cover  and  layered 
amount).  Data  may  be  treated  as  strictly  the  output  of  a  classification 
scheme  by  quantizing  the  formally  continuous  variables,  and,  in  fact, 
present  3DNEPH  procedures  do  produce  reports  of  percentage  cloud  cover 
based  on  the  number  of  cells  in  the  1/8  grid  analysis  window  that  are 
judged  to  be  cloud  covered.  Within  the  classification  schemes,  decisions 
tend  to  be  based  on  properties  of  samples  as  a  group,  e.g.,  average 


-13- 


brightness,  modal  (histogram)  analysis,  range,  and  variability.  The 
total  process  is  thus  one  of  data  reduction,  where  sensor  output  (sensed 
radiation)  is  subject  to  feature  extraction  which  provides  a  relatively 
small  number  of  important  measures  to  the  classification  scheme.  There 
are  three  major  ways  in  which  the  above  classification  scheme  can  be 
aided.  First,  more  and  perhaps  better  input  data  could  be  provided  by 
sensors  which  measure  radiation  in  other  frequency  bands.  (Note  that 
the  sensors  often  measure  physical  properties  which  only  imply  the  output 
of  the  classifier,  i.e.,  satellite  radiometers  directly  measure  energy 
and  not  cloud  amount).  Second,  the  list  of  derived  statistics  could  be 
augmented,  or  improved  in  some  other  manner.  Finally,  action  of  the 
classifier  could  be  changed  by  modifying  the  classification  procedure. 

A  recent  survey  made  by  Pickett  and  Blackman  (1976),  summarizes 
the  state  of  nephanalysis  at  AFGWC  as  of  April,  1976.  Before  treating, 
in  subsequent  sections,  some  of  the  potential  gains  from  joint  use  of 
information  in  several  (electromagnetic)  spectral  bands,  it  is  helpful 
to  put  the  recent  evolution  of  3DNEPH  in  perspective. 

Because  of  processing  time  constraints,  3DNEPH  was  originally 
designed  to  use  little  in  the  way  of  analytical  power,  and  emphasized 
strong  compression  of  the  data  stream.  Furthermore,  the  compression 
was  not  simply  on  output,  with  the  64  visible  and  64  IR  radiometer  data 
points  processed  to  yield  the  1/8  mesh  output  report  (layers,  cloud  type 
and  coverage),  but  involved  Intermediate  compression  or  data  reduction, 
as  shown  in  Eqs.  1.3.1  -1.3.2,  In  fact,  the  highest  resolution  data 
from  the  satellites  is  not  used  directly  in  3DNEPH,  again  because  of  data 


flow  and  processing  time  constraints.  Given  the  scenario  in  which 
3DNEPH  was  designed  to  operate,  tradeoffs  between  computation  time  and 
full  use  of  data  (dimensionality  of  feature  space)  were  reasonably  made. 
With  the  advent  of  higher  resolution  and  multispectral  systems,  however, 
the  data  compression  problem  is  more  severe.  Yet,  justification  for 
these  new  systems  requires  that  these  new  data  be  fully  exploited. 

The  improvement  of  remote  sensing  capability  primarily  in  terms 
of  increased  (electromagnetic)  spectral  information  makes  amenable  to 
solution  several  problems  (e.g,,  cloud  identification)  which  could  not 
previously  be  properly  treated.  Also,  some  of  the  simplifying  assumptions 
implicit  in  3DNEPH,  e.g.,  that  satellite  radiometers  sense  cloud  top 
temperature  and  therefore  cloud  top  height  rather  than  radiation  both 
from  the  cloud  and  upwelling  through  the  cloud,  need  not  be  made.  On 
the  other  hand,  assumptions  of  this  type  may  well  be  acceptable  for  low 
priority  areas.  This  dichotomy  suggests  strongly  that  upgrading  3DNEPH 
take  two  complementary  paths:  full  use  of  all  data  in  selected  areas, 
to  solve  important  problems  (cloud  amount,  phase,  ice  vs.  water),  directly, 
and  efficient  feature  extraction  for  all  other  cases. 

In  the  satellite  IR  data  processor,  several  different  types  of 
computation  are  performed.  The  difference  between  surface  and  radiometer- 
sensed  temperature  is  used  to  infer  presence  of  clouds  (a  yes  or  no  type 
of  decision)  possibly  yielding  misinterpretation  of  low  fog  or  stratus 
as  a  clear  sky  condition.  Total  cloud  cover  is  computed  using  a  decision 
matrix  approach  based  on  temperature  differences.  These  tests  may  be 
sharpened  using  statistical  measures  of  cloud  presence  and  type. 


In  the  current  version  of  3D  Nephanalysis ,  infrared  data  is 
merged  with  other  data  in  several  ways.  If  video  data  is  also  available 
then  the  total  cloud  cover  estimate  is  based  on  the  most  recent  measure¬ 
ment.  If  both  types  of  measurement  were  obtained  at  the  same  time,  then 
the  total  cloud  cover  is  assumed  to  be  the  higher  of  the  two  computed 
values.  Both  of  these  procedures  can  be  analyzed  to  determine  if  a 
statistical  merging  technique  would  yield  more  accurate  results. 

1 .4  Previous  Techniques  for  Extraction  of  Cloud  Information 

Numerous  techniques  for  extracting  cloud  information  from  satel 
lite  data  have  been  developed  over  the  past  several  years.  Examples  of 
the  information  which  may  be  obtained  are  cloud  cover,  cloud  height  and 
thickness,  cloud  type,  phase  (ice  or  water),  layering  and  precipitation. 
In  the  remainder  of  this  subsection  some  of  the  most  significant  tech¬ 
niques  and  results  pertaining  to  the  problem  of  obtaining  cloud  descrip¬ 
tions  from  satellite  data  are  presented.  All  of  these  techniques  are 
based  on  the  assumption  that  temperature  decreases  monotonically  with 
increasing  altitude. 

1,4.1  Cloud  Cover 

There  are  several  techniques  for  estimating  the  amount  of  cloud 
cover  in  a  particular  region  being  scanned  by  passive  sensors.  One  tech 
nique  presently  used  in  3DNEPH  is  a  frequency  histogram  of  infrared 
observations.  Cloud  cover  can  be  estimated  from  the  probability  density 
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distribution  of  the  difference  between  earth  surface  temperature  and  the 
equivalent  observed  black-body  temperature,  using  the  10.5-12,5  pm  atmo¬ 
spheric  "window"  band.  If  it  is  assumed  that  clouds  are  of  one  type  with 
a  characteristic  cloud-top  temperature,  the  surface  temperature  is  constant 
and  the  clouds  are  black  with  respect  to  IR  flux,  then  the  ideal  frequency 
distribution  is  shown  in  Fig,  1,4,1, 


Figure  1.4.1.  Ideal  Frequency  Distribution 


If  the  observations  are  evenly  distributed  over  the  sensed  region,  then 
the  height  of  each  peak  is  proportional  to  the  fractional  cover  of  the 
thpe  associated  with  the  peak.  A  more  realistic  profile  is  given  in 
Fig.  1.4.2. 


Figure  1.4.2.  Realistic  Frequency  Distribution 


Now  the  fractional  cover  of  each  type  is  proportional  to  the  area  under 
each  mode.  This  technique  can  be  extended  to  include  two  or  more  cloud 
types.  NIMBUS  II  MRIR  channel  2  data  (10-11  pm)  has  been  analyzed  using 
this  technique  by  Lo  and  Johnson  (1971).  This  approach  has  also  been 
used  by  Rao  (1970) . 

In  other  studies,  Fritz  and  Rao  (1967)  used  measurements  at 
6  pm  and  10  pm  to  locate  regions  of  substantial  cloudiness.  Koffler, 
et  al.  (1973)  used  ITOS  I  data  to  obtain  cloud  amount.  The  data  used 
was  IR  data  at  11  pm  and  temperature  measurements  at  400  and  700  mb. 

It  was  found  that  there  was  a  tendency  to  overestimate  the  total  cloud 
amount, 

Rao  (1970)  proposed  a  simple  technique  for  cloud  cover  deter¬ 
mination.  Let  TD  =  {surface  temperature  -  radiation  temperature  from 
satellite  data}.  Then,  approximately 

TD=|JaZ,  (1.4.1) 

where  - ( aT/aZ )  is  the  lapse  rate,  assumed  constant,  and  aZ  is  the  height 
difference  between  the  radiating  surface  and  the  ground.  Large  values 
of  TD  indicate  thick  clouds  at  high  and  middle  levels,  while  small  values 
of  TD  indicate  relatively  cloud-free  conditions.  The  use  of  a  relative 
measure  such  as  TD  helps  to  eliminate  the  problem  of  cold  surface  temper¬ 
atures  (due  to  snow  and  ice)  which  give  low  absolute  temperature  readings. 
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One  reason  for  determining  the  percentage  of  cloud  cover  is  to 
simplify  calculation  of  other  meteorological  parameters  such  as  vertical 
temperature  profile.  The  fundamental  quantities  of  interest  are  the 
"cl ear-column  radiances",  which  are  the  equivalent  cloud-free  radiances. 
Using  these,  the  inverse  problem  is  solved  to  recover  the  temperature 
profiles.  The  cl  ear-column  radiance  at  frequency  v  may  be  estimated 
from  (Smith,  et  al , ,  1974) 


Ic(v) 


Ij(v)  -  NI 2 ( v ) 

itn  * 


where  Ij(v),  I2(v)  are  spatially-independent  measured  radiances  and 


N  = 


Ic(w)  -  I-^w) 


with  I-|(w),  I2(w)  selected  to  ensure  that  0  _<  N  £  1 ,  and  where  N.  is  the 
fractional  cloud  cover  in  the  region  producing  measurement  I ^ ( v ) .  Ic(w), 
the  clear  column  radiance,  is  obtained  by  using  the  10-11  pm  atmospheric 
window  for  IR  measurement.  Thus,  this  technique  can  be  used  to  infer  the 
relative  percentages  of  cloud  cover  in  regions  which  have  the  same  surface 
temperature  but  different  observed  equivalent  black-body  temperatures  due 
to  cloud  cover  differences. 


1.4.2  Cloud  Height  and  Thickness 


A  technique  due  to  Rao  (1970)  uses  TD  to  estimate  approximate 
cloud  height.  Figure  1.4.3  depicts  the  qualitative  relationship  bet¬ 
ween  frequency  distribution  and  temperature  for  several  typical  cloud 
heights.  Thus,  an  estimate  of  cloud  height  may  be  obtained  by  matching 
observed  data  to  the  curves  of  the  figure. 


Figure  1,4.3.  Hypothetical  Cumulative  Frequency  Distribution  of 
TD  (°K)  for  Various  Cloud  Heights 

Since  there  is  generally  very  little  water  vapor  above  the 
clouds,  a  radiometer,  in  the  presence  of  thick  clouds,  will  record  the 
energy  associated  with  the  cloud  top,  which  may  be  40°K  or  more  below 
the  surface  temperature.  Many  factors  influence  the  accuracy  of  the 
estimates,  including  cloud  amount,  surface  temperature,  and  vertical 
distribution  of  water  vapor  and  temperature. 
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In  their  study  using  TIROS  I  data,  Koffl er,  et  al .( 1 973)  were 
able  to  specify  "no  clouds",  "low  clouds",  "middle  clouds"  or  "high 
clouds",  using  IR  and  400  and  700  mb  temperatures  only.  However,  cirrus 
clouds  were  found  difficult  to  locate  and  there  was  a  tendency  to  under¬ 
estimate  cloud  height.  Reynolds  and  VonderHaar  (1973)  used  visible  data 
from  the  ATS-3  geosynchronous  satellite  to  infer  cumulus  cloud  height 
and  growth.  Cloud  height  derived  from  ground  radar  was  compared  to 
reflected  solar  radiance  in  the  green  region.  By  linearly  fitting  the 
data  it  was  found  that  a  correlation  of  0,88  existed,  implying  that 
cloud  height  could  be  accurately  inferred  directly  from  the  satellite 
data.  Griffith  and  Woodley  (1973)  found  that  cloud  height  and  brightness 
are  highly  correlated.  Ground  radar  observations  were  compared  with 
satellite  visual  photographs.  The  estimated  accuracy  was  1-2  km.  The 
results  of  their  study  implied  that  brightness-enhanced  satellite  imagery 
can  be  used  to  estimate  rainfall. 

Cloud  thickness  estimation  was  investigated  by  Lo  and  Johnson 
(1971)  who  found  that  clouds  with  large  vertical  extent  have  low  equi¬ 
valent  blackbody  temperatures. 

1.4.3  Cloud  Type 

Fritz  and  Rao  (1967)  found  that  cirrus  clouds  have  a  higher 
transmission  for  radiation  at  10  pm  than  for  6  pm  (about  50%  versus 
almost  opaque).  In  their  study,  Lo  and  Johnson  (1971)  found  that 
clouds  with  high  water  vapor  content  (either  in  the  liquid  or  frozen 
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state)  have  lowest  equivalent  blackbody  temperatures.  They  studied 
data  from  NIMBUS  II  MRIR  channels  1  (6. 4-6, 9  pm)  and  2  (10-11  pm).  The 
channel  2  temperature  difference  (TD2)  was  studied,  with  TO  =  {surface 
temperature  -  radiation  temperature  from  satellite  data}.  High  values 
of  TD2  indicated  regions  of  cumulonimbus  activity.  High  values  of  the 
channel  1  temperature  difference  (TD1)  were  found  to  indicate  cirrus 
cloudiness.  Using  both  channels  enabled  the  water  vapor  content  above 
low-level  clouds  to  be  estimated, 

Shenk,  et  al .  (1976)  used  four  channels  of  the  NIMBUS-3  MRIR 
to  identify  unique  individual  cloud  types  and  cloud  type  combinations 
over  tropical  oceans.  The  channels  used  were  0. 2-4.0  pm,  6. 5-7.0  pm, 
10-11  pm  and  20-23  pm  and  the  cloud  categories  that  were  identified 
included  cirrus,  cirrostratus ,  stratus  or  stratocumulus ,  and  cumulo¬ 
nimbus.  A  decision  matrix  is  set  up  to  determine  both  single  cloud 
types  and  appropriate  cloud  combinations.  The  method  may  be  extendable 
to  include  ocean  areas  in  middle  and  high  latitudes.  Channel  deletion 
was  studied  to  determine  sensitivity  of  the  solutions  to  loss  of  data. 
Deletion  of  the  20-23  pm  channel  apparently  had  no  adverse  effect  for 
tropical  regions  but  may  be  important  at  middle  latitudes.  With  just 
the  6. 5-7.0  pm  and  10-11  pm  channels,  there  was  a  tendency  to  over¬ 
estimate  the  cumulonimbus  cover  and  middle  cloud  cover  beneath  a  high 
cirrus  overcast.  With  just  the  6, 5-7,0  pm  channel  areas  with  reasonably 
dense  cirrus  were  mapped  successfully. 

Following  Anderson,  et  al .  (1974),  3DNEPH  currently  uses  a 


bivariate  distribution  of  IR  and  visible  satellite  data  to  infer  cloud 
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types.  Among  the  distinct  cloud  types  which  could  be  discriminated  were 
thin  cirrus,  dense  cirrus,  stratus  and  stratocumulus ,  and  cumulonimbus. 
Booth  (1973)  and  Sikula  (1974)  have  used  information  bearing  on  cloud 
size  to  help  in  cloud  type  classification.  Power  spectra  of  visible  and 
IR  data  have  been  combined  with  first-order  measures  such  as  percentile 
brightness  and  temperature  values,  to  provide  "features"  upon  which  a 
linear  discriminant  classifier  acted. 

1 . 5  Thrust  of  Study 

The  principal  thrust  of  this  study  is  the  development  of  effi¬ 
cient  and  robust  techniques  for  inference  of  meteorological  parameters 
from  multi  spectral  satellite-borne  spectrometers.  The  parameters  under 
consideration  include  cloud  parameters,  atmospheric  parameters  and 
surface  parameters,  over  both  ocean  and  land. 

It  is  argued  that  the  solution  involves  optimum  processing  of 
information,  in  which  available  a  priori  information  is  combined  with 
the  information  carried  by  the  remotely-sensed  data.  Since  both  types 
of  information  carry  uncertainty,  the  appropriate  methodology  is,  of 
necessity,  a  statistical  one. 

A  systematic  and  comprehensive  approach  to  this  problem  is  put 
forth  which  is  based  on  Bayesian  decision  analysis.  Specific  techniques 
are  developed  to  handle  both  continuous-valued  and  discrete-valued 
parameters.  This  approach  offers  several  distinct  advantages  over 
deterministic  methods  or  direct  statistical  approaches  such  as  regression 
techniques : 
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(1)  The  physics  of  radiative  transfer  may  be  used  directly 
through  use  of  well -understood  mathematical  models. 

(2)  Sensor  noise  can  be  treated  directly, 

(3)  The  a^  priori  information  and  the  information  carried  by 
the  available  data  are  combined  in  a  statistically  optimum 
manner. 

(4)  Nonlinearities  may  be  handled  in  a  very  direct  manner. 

(5)  Missing  data  or  degraded  measurements  are  handled 
directly  in  a  statistically  optimum  manner. 

The  approach  to  estimation  of  continuous-valued  parameters  is 
based  on  Kalman  filtering  theory,  A  particular  advantage  Is  that  the 
theory  is  easily  extendable  to  include  the  dynamic  problem,  in  which 
parameters  are  updated  in  real  time  whenever  measurements  become  available. 
This  is  possible  since  the  Kalman  filter  carries  along  not  only  the  best 
estimate  of  present  values  of  all  parameters,  but  also  an  estimate  of 
the  covariance  matrix  of  estimation  errors;  that  is,  the  Kalman  filter 
is  fully  recursive. 

The  approach  to  inference  of  discrete-valued  parameters  involves 
the  use  of  decision  trees  determined  by  nonparametric  statistical  methods. 
Their  performance  approaches,  aysmptotically,  the  Bayes  error  rate. 

With  the  Bayesian  approach  it  is  possible  to  compute  absolute 
upper  bounds  of  performance,  even  in  the  nonlinear  case,  for  a  given 
channel  set  as  a  function  of  the  a  priori  parameter  uncertainties  and 
the  sensor  noise  variances. 


This  approach  is  evaluated  to  assess  the  possibility  of 
incorporation  into  the  3DNEPH  program.  The  scope  of  the  study  has 
been  necessarily  restricted  in  several  ways.  First,  the  data  base 
was  generated  exclusively  from  detailed  simulations  at  ERT  and  AFGL 
in  order  to  procure  a  sufficiently  large  validated  data  set.  This 
allows  quantitative  evaluation  of  sensitivity  to  noise,  missing  data 
and  other  modeling  errors  and  direct  comparison  of  techniques.  The 
study  was  restricted  to  one  (vertical)  dimension;  i.e.,  the  horizontal 
field  was  assumed  to  be  absolutely  homogeneous  over  the  field  of  view 


of  each  sensor. 


II.  PHILOSOPHY  OF  APPROACH 


In  previous  sections,  various  specialized  techniques  for  cloud 
identification  using  satellite  spectral  data  were  discussed.  These 
techniques  have  been  developed,  expanded,  and  improved  as  different 
types  of  data  (visible,  IR,  microwave,  etc.)  have  become  available. 

Most  of  the  techniques  and  algorithms  were  first  developed  to  handle 
a  single  type  of  data.  Some  of  these  techniques  have  been  augmented 
to  include  more  than  one  type  of  sensor,  but  a  unified  approach  to  the 
problem  is  still  lacking.  In  this  section,  we  discuss  a  statistical 
methodology  for  combining  information  from  different  sources  and  for 
making  optimum  decisions.  The  basis  of  this  methodology  is  Bayesian 
Decision  Analysis.  The  reasons  for  considering  statistical  approaches 
are  that  observations  are  contaminated  with  noise,  different  measure¬ 
ments  have  different  accuracies  and  models  contain  random  errors. 

2.1  Bayesian  Decision  Analysis 

Statistical  inference  and  decision  analysis  involve  combining 
different  sets  of  observations  to  compute  probabilities  of  occurrence 
of  different  events.  In  the  Multispectral  Cloud  Identification  Study, 
the  observations  are  represented  by  radiance  measurements  at  different 
frequencies  and  the  events  are  represented  by  different  cloud  types, 
heights,  phases,  mass  and  other  parameters.  It  is  possible  to  define 
a  vector  of  parameters  for  a  cloud  such  that  the  values  taken  by  the 
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vector  may  be  called  a  state  vector  of  the  cloud  and  will  be  denoted 
by  x.  Notice  than  an  element  of  x  corresponding  to  cloud  height  will 
be  a  continuous  variable.  Other  elements  of  x  would  include  cloud 
phase,  thickness,  amount  and  extent. 

We  will  denote  by  y  the  vector  of  spectral  measurements  from 
the  satellite  during  one  pass.  Generally,  y  would  correspond  to  measure¬ 
ments  over  a  single  field  of  view  since  this  is  the  easiest  case  to 
consider.  In  certain  cases,  better  cloud  identifications  may  be  obtained 
by  considering  observations  and/or  their  statistics  based  on  a  number  of 
adjacent  fields  of  view.  It  is  assumed  here  that  the  raw  observations 
have  been  processed  before  they  are  incorporated  in  the  vector  y. 

The  basic  calculation  of  Bayesian  Inference  is  to  obtain  the 
posterior  probability  distribution  of  x  given  y,  denoted  mathematically* 
by  p(x|y).  The  celebrated  Bayes  Formula  gives  this  as 


p(x|y) 
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(2.1.1) 


Equation  (2.1.1)  performs  elegantly  the  "inversion"  process  used 
in  many  applications  including  IR  inversions  to  obtain  temperature  profiles. 
The  probability  p(y|x)  denotes  the  conditional  probability  of  observing  y 
given  that  x  has  occurred.  For  example,  if  x  corresponds  to  a  high  cirrus 
cloud,  then  p(y|x)  is  specified  using  theoretical  and  empirical  radiative 
transfer  models  for  cirrus  clouds.  Thus,  the  solution  to  the  "direct" 
or  "forward"  problem  along  with  a  sensor  noise  model  specifies  p(y|x). 

The  probability  p(y|x)  is  defined  for  all  values  of  x  in  the  set  of 
relevant  cloud  parameter  values. 


*A11  marginal  probability  distributions  are  denoted  by  p(*) 
and  conditional  distributions  as  p(*|*). 


The  probability  p(x)  denotes  a  priori  probability  of  x  before 
y  is  observed.  It  may  be  obtained  using  climatological  data  and  measure¬ 
ments  from  previous  passes.  Global  cloud  models  that  have  been  developed 
over  the  last  decade  and  certain  known  properties  such  as  height  and 
extent  of  different  cloud  types  will  be  used  to  define  the  probability 
p(x).  It  should  be  pointed  out  that  some  of  the  temperature  inversion 
techniques  make  use  of  prior  data  but  do  so  only  in  a  heuristic  fashion. 

Bayes  formula,  viz.  Eq.  (2.1.1),  computes  the  probability  of  x 
given  y,  i.e.  the  probability  of  each  event  x  given  a  particular  obser¬ 
vation  y.  In  this  sense,  it  inverts  the  "direct"  probability  p(y|x), 
which  has  to  be  defined  over  the  space  of  all  possible  events  denoted 
by  nx. 

Once  the  posterior  probabilities  p(x|y)  are  obtained,  a  decision 
as  to  the  most  likely  x  may  be  obtained  by  simply  picking  the  x  value 
with  the  highest  posterior  probability.  For  certain  other  applications, 
the  mean  or  the  median  of  p(x|y)  may  be  a  more  appropriate  value  to  use. 

Advantages  of  the  Bayesian  Decision  Analysis  Approach 

The  Bayes  Decision  Analysis  approach  is  a  comprehensive  approach 
and  includes  specialized  approaches  such  as  Regression,  Decision  Trees  and 
Minimum  Information  methods  as  special  cases.  In  the  statistical  liter¬ 
ature,  it  is  recognized  to  be  the  best  approach  for  combining  information 
from  different  sources.  It  has  been  used  extensively  in  system  identifi¬ 
cation,  hypothesis  testing,  communication  systems,  and  navigation  systems. 
For  example,  the  navigation  system  of  modern  fighter  aircraft  combines 
inertial  and  sensor  information  using  a  Kalman  Filter  which  is  based  on 
the  Bayes  Formula. 


The  Bayesian  Decision  Analysis  approach  is  also  flexible  in 
the  sense  that  new  types  of  measuremercs  car,  be  added  without  changing 
the  structure  of  the  algorithm.  For  computational  efficiency,  it  is 
customary  to  choose  probabilities  p(y|x)  and  p(x)  from  the  so-called 
"reproducing  conjugate"  family  which  has  the  property  that  the  prior  and 
the  posterior  distributions  are  of  the  same  type,  For  example,  if  p(x) 
is  chosen  as  multivariate  normal  and  p(y|x)  is  also  multivariate  normal, 
then  p(x|y)  automatically  turns  out  to  be  multivariate  normal  also.  In 
this  case,  only  the  means  and  covariances  have  to  be  computed  and  the 
integration  in  the  denominator  of  Eq.  (2.1.1)  is  performed  analytically. 

2.2  Estimation  of  Continuous-Valued  Parameters 

Parameters  such  as  cloud  height,  cloud  thickness,  integrated 
liquid  water  content,  integrated  water  column,  ocean  surface  wind  speed 
and  rain  rate  may  be  regarded  as  continuously  variable  between  certain 
limits.  These  parameters  may  be  estimated  using  the  tools  of  modern 
estimation  theory,  in  particular,  the  Kalman  Filter  and  some  of  its 
variations.  This  technique  is  a  way  of  estimating  variables  which  change 
with  time,  as  do  all  of  the  variables  of  interest  in  this  study.  However, 
in  this  study,  we  have  assumed  all  variables  to  be  static;  nevertheless, 
the  Kalman  Filter  provides  an  appropriate  methodology  for  the  static 
roblem,  and  possesses  the  advantage  of  being  extendable  in  a  straight¬ 
forward  manner  to  the  dynamic  problem  at  a  later  time.  For  this  reason, 
a  discussion  of  the  Kalman  Filter,  in  its  full  dynamic  context  is  now  given. 


2.2.1 


Kalman  Filtering  Method  [Kalman,  1960] 

The  Kalman  Filter  approach  to  estimation  is  based  on  certain 


assumptions  made  on  the  nature  of  the  variables  to  be  estimated,  the 
measurement  process  and  noises  corrupting  the  system.  The  variables 
to  be  estimated  are  collected  together  in  a  state  vector  (x)  which 
evolves  as  a  Gauss-Markov  process  with  known  statistics.  The  measure¬ 
ments  are  constrained  to  be  linearly  related  to  the  state  and  are 
corrupted  by  additive  white  Gaussian  noise  with  known  statistics. 

Under  these  assumptions,  the  conditional  mean  (minimum-variance  estimate) 
of  x,  given  all  of  the  data,  may  be  computed  recursively.  The  error 
covariance  matrix  of  the  estimate  is  also  computed  recursively,  but 
is  independent  of  the  values  of  the  measurements.  The  conditional 
mean  is  Gaussianly  distributed  and  thus  all  statistical  information 
regarding  the  state  is  available. 

The  generality  of  the  state  variable  formulation  and  the  ease  of 
filter  implementation  on  digital  computers  has  led  to  its  great  popularity, 
especially  in  aerospace  problems.  Multivariate  estimation  problems  can 
be  handled  systematically  for  either  time-varying  or  time-invariant 
systems.  A  wealth  of  experience  has  been  gained  over  the  past  fifteen 
years  in  solving  many  diverse  problems.  These  include  spacecraft, 
marine  and  aircraft  navigation,  signal  processing,  industrial  process 
control,  economic  forecasting,  satellite  orbit  determination,  and 
image  processing. 

The  cloud  identification  problem  is  an  example  of  one  in  which 
the  assumptions  of  Gaussianness,  linearity  and  known  noise  statistics 
do  not  strictly  hold.  In  this  case,  extended  Kalman  filtering  theory. 
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which  is  discussed  in  Section  2.4,  is  expected  to  produce  better  results 
than  other  techniques  such  as  the  statistical  D-method,  the  statistical 
eigenvector  method,  or  regression  techniques.  The  reason  for  this  is 
that  the  Kalman  filter  formulation  allows  a  much  more  complete  statistical 
description  of  the  problem  to  be  utilized.  The  Kalman  filter  formulation 
is  based  on  a  state-space  representation  of  the  system  to  be  analyzed. 

We  now  briefly  describe  the  state-space  approach. 

2.2.2  State  Space  Models 

State  space  models  of  random  processes  are  based  on  the  Markov 
property  which,  in  simple  terms,  implies  the  independence  of  the  future 
of  the  process  from  its  past  given  the  present  state.  In  other  words, 
the  state  of  a  Markov  process  summarizes  all  of  the  information  from  the 
past  that  is  nessary  to  predict  its  future.  For  obvious  reasons,  only 
the  case  where  the  state  vector  is  fintte-dimensional  is  of  practical 
interest.  A  general  state  vector  model  is  typically  specified  in  terms 
of  the  following  five  quantities  (see  also  Figure  2.2,1): 

(i)  three  vectors,  respectively,  of  input,  output  and 
internal  state  variables; 

(ii)  a  rule  for  transformation  of  the  state  vector 
from  one  time  instant  to  the  next; 

(iii)  a  relationship  between  the  input,  output  and 
state  variables; 

(iv)  initial  state; 

(v)  joint  statistics  of  all  random  variables, 

Mathematically, 


x(k+l)  =  f(x(k),k)  +  G(k)  w(k) 
y(k+l)  =  h(x(k+l),  k+1)  +  v(k) 


(2,2,1) 

(2.2.2) 


Figure  2.2.1  State  Vector  Model 


where  x(k)  is  the  n *  1  state  vector  at  time  t^,  w(k)  is  the  nx  1 
process  noise  vector,  y(k)  is  the  mxl  measurement  vector,  v(k)  is 
the  mx  1  measurement  noise  vector  and  w(k)  are  assumed  to  be  uncorre¬ 
lated  white  noise  sequences  with  known  distributions.  The  function 
f(x(k),k)  represents  the  deterministic  portion  of  the  state  transition 
from  time  k  to  k+1 .  The  function  h(x(k+l ) ,k+l )  represents  the  noiseless 
mapping  of  the  state  space  into  the  measurement  space.  In  the  general 
case,  both  functions  may  be  nonlinear.  Similarly,  the  distribution 
of  the  initial  state  x(0)  is  assumed  known. 

At  present,  our  study  has  considered  processing  data  at  a 
single  time.  Thus,  in  the  sequel  we  will  neglect  the  time  dependence. 
This  simplifies  the  discussion  to  follow.  However,  it  is  important 
to  point  out  that  the  problem  of  processing  measurements  taken  at 
different  times  and  locations  can  be  attacked  using  the  general  approach 
now  described. 

The  linear  form  of  Eq,  (2.2.2)  is  of  special  importance  and 
is  written  as 

y  =  Hx  +  v.  (2.2.3) 

If  v  and  x  are  Gaussian  then  so  is  y.  The  state  is  assumed  Gaussian 

with  mean  x  and  covariance  matrix  P„.  The  noise  v(k)  is  assumed  to 
o  o 

have  zero  mean  and  covariance  matrix  V.  The  measurement  matrix  H  is 
deterministic,  The  principal  advantage  of  this  representation  is  that 
the  posterior  distribution  p(x|y)  of  x,  given  y,  turns  out  to  be 
Gaussian  and  its  first  two  moments  are  computed  by  the  Kalman  filter. 


2.2.3  Kalman  Filter  Equations 


Denote  the  mean  and  covariance  of  the  Gaussian  density 
function  p(x|y)  by  x  and  P,  respectively.  Then  the  Kalman  Filter 
equations  are  given  as  follows; 


x  =  xQ  +  Mv 

(2.2.4) 

v  =  y  -  H  xQ 

(2.2.5) 

m  =  p  ht  r1 

0  L 

(2.2.6) 

Y  =  H  P  HT  +  V 
u  0 

(2.2.7) 

P  =  P  ■  M  H  P„ 

0  0 

(2.2.8) 

where  v,  the  "innovations,"  represents  the  differences  between  the 
predicted  and  actual  measurements.  This  quantity  may  be  regarded  as 
the  new  information  brought  in  by  the  observation  y.  The  quantity  £ 
represents  the  covariance  matrix  of  the  innovations  and  M  is  a  given 
matrix  (often  called  the  Kalman  gain)  that  is  used  to  combine  the 
innovations  with  the  prior  estimate  xQ  to  yield  the  estimate  x. 
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2.2.4  Properties  of  Kalman  Filters 

Kalman  Filters  possess  a  number  of  desirable  properties  which 
are  of  practical  interest.  These  properties  are  discussed  briefly  here. 

Optimality.  The  Kalman  filter  estimate  x  is  the  minimum  vari¬ 
ance  estimate  of  x  given  the  measurement  y.  In  addition,  since  the  mean, 
mode  and  median  of  the  Gaussian  density  are  identical,  x  is  optimal  for 
a  whole  class  of  symmetric  loss  functions.  Since  the  Kalman  Filter 
computes  the  first  two  moments  which  completely  characterize  a  Gaussian 
random  variable,  all  statistical  information  regarding  x  contained  in 
the  measurement  y  is  available  from  the  Kalman  filter  estimate.  For 
example,  the  covariance  matrix  P  provides  confidence  limits  for  the 
estimate  x. 

Innovation  Property,  The  one-step  ahead  prediction  error  v 
is  known  as  the  innovation  since  it  represents  new  information  brought 
into  the  filter  estimate  by  the  measurement  y.  The  innovation  has  the 
interesting  property  that  for  an  optimal  filter  it  is  a  zero  mean 
Gaussian  white  noise  with  covariance  matrix  z.  This  property  can  be 
used  to  test  the  optimality  of  Kalman  Filters  in  appl ■’cations,  to 
detect  changes  in  the  model,  to  throw  out  bad  data,  and  to  build 
adaptive  and  robust  Kalman  filters.  It  can  be  shown  that  the  inno¬ 
vation  contains  as  much  information  concerning  the  state  as  the 
original  measurement  seauence. 

Numerical  Properties.  The  Kalman  filter  equations  can  be  written 
in  different  forms  with  different  numerical  properties.  It  turns  out  that 
equations  (2.2.4)  -  (2.2,8),  known  as  the  covariance  form  of  the  optimal 


filter,  though  physically  easiest  to  comprehend,  are  not  best  suited 
for  numerical  computation.  In  systems  with  widely  separated  eigenvalues 
of  the  estimation  error  covariance  matrix,  round-off  errors  can  lead 
to  non-negative  definite  covariance  matrices.  A  solution  to  this 
problem  is  obtained  by  using  Cholesky  square-roots  of  the  covariance 
matrices.  Consider  the  nxn  matrix  S  defined  by 

P  =  S  ST  (2.2.9) 

which  is  usually  computed  in  either  upper  or  lower  triangular  form. 
Equations  may  be  developed  for  computing  S  and  the  covariance  matrix 
is  then  computed  from  (2.2.9).  Since  the  condition  number  of  S  is  half 
that  of  P,  improved  accuracy  is  obtained  on  finite  word  machines.  In 
particular,  the  accuracy  of  single-precision  square-root  Kalman  filters 
is  comparable  to  that  of  double  precision  Covariance  Kalman  filters. 

For  any  real  S,  SST  is  guaranteed  to  have  no  negative  eigenvalues. 

2.2.5  Kalman  Filter  Approximations  in  Nonlinear  Problems 

In  reality,  the  brightness  temperature  measurements  will  not  be 
linear  in  the  state.  In  this  case,  the  Kalman  filter  equations  of 
Section  2.2.3  do  not  strictly  apply  and  some  approximations  must  be  made. 
Several  techniques  have  been  developed  to  handle  such  cases.  The 
Extended  Kalman  filter  uses  a  linearization  approach  around  the  present 
estimated  value  of  the  state.  The  extended  Kalman  filter  equations 
for  a  nonlinear  measurement  are  similar  to  Eqs.  (2.2.3)  -  (2.2.8)  with 
the  following  exceptions: 

(i)  The  innovation  v  is  of  the  form 

v  =  y  -  h(x0)  (2.2.10) 


instead  of  (2.2.5). 


(ii)  The  measurement  matrix  H  is  replaced  by 


3h(x) 

3X 


x=x 


0 


(2.2.11) 


Considerable  experience  has  been  gained  over  the  past  fifteen  years  in 
applying  the  extended  Kalman  filter  to  a  variety  of  problems,  most 

notably  in  the  aerospace  field.  Results  indicate  that  filter  performance 

a2 

is  quite  good  for  problems  in  which  — y  P  <  V.  This  symbolic  inequality 
means  that  the  expected  effect  of  the  second-order  terms  in  the  Taylor 
expansion  around  the  estimate  is  "smaller"  than  that  of  the  sensor 
uncertainty.  Thus,  good  performance  may  be  expected  when  either  the 
nonlinearity  is  small  or  the  error  variances  are  small.  Although  higher- 
order  expansions  can  be  used  to  form  other  approximate  nonlinear  filters, 
these  have  generally  not  worked  out  in  practice  due  to  problems  of 
filter  divergence.  For  more  highly  nonlinear  problems  in  which  the 
extended  Kalman  filter  does  not  work  well,  the  best  approaches  appear 
to  be  to  either  transform  the  state  to  make  the  measurements  more 
linear  or  use  the  Iterated  Extended  Kalman  filter.  In  this  approach, 
the  nonlinear  measurement  function  h(*)  is  re-evaluated  at  the  estimate 
x  after  processing  y.  A  new  estimate  is  then  found  and  again  used  to 
re-evaluate  h(*)-  This  process  is  continued  until  convergence  is 
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2 . 3  Estimation  of  Discrete-Val ued  Parameters 

Parameters  such  as  cloud  type  and  surface  type  are  conve¬ 
niently  expressed  as  Boolean  variables.  That  is,  they  may  take  only 
a  few  discrete  (assigned)  values.  Cloud  types  may  be  expressed  as 
"clear"  (no  clouds),  "water  cloud"  or  "ice  cloud"  and  the  surface  may 
be  categorized  as  ice,  snow  or  loam,  for  example.  The  problem  of 
detecting  this  type  of  random  variable  is  distinctly  dif^-ent  t^an 
estimating  continuous  random  variables,  due  to  the  fa<  t  ••  ••  *nev  are 
discrete-valued. 

In  the  context  of  the  present  study,  this  problem  is  a  simple 
type  of  scene  classification  problem.  Its  simolicity  derives  from  the 
fact  that  the  scene  is  assumed  to  be  homogeneous.  Thus,  the  cloud  type 
or  type  of  surface  cover  is  assumed  to  be  invariant  over  the  entire  scene 
This  assumption  is  a  reasonable  one  for  a  study  of  this  typo,  since 
feasibility  of  classification  must  first  be  established.  Once  this  has 
been  shown,  attention  can  be  given  to  the  more  difficult  problem  of 
nonhomogeneous  scenes.  With  this  assumption,  the  problem  is  reduced 
to  one  of  pattern  recognition,  which  is  now  discussed  briefly. 

2.3.1  Classification  Using  Pattern  Recognition 

The  pattern  recognition  problem  is  generally  solved  in  two  steps 
(1)  feature  extraction,  (2)  classification  in  feature  space.  Feature 
extraction  may  be  viewed  as  a  process  of  selecting  a  suitably-transformed 
subset  of  the  raw  data  for  use  in  the  classifier.  Ideally,  one  wishes  to 
keep  only  the  most  informative  data  and  throw  away  the  rest.  Once  the 
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features  are  extracted,  classification  takes  place  in  the  selected 
feature  space. 

Pattern  classification  can  be  accomplished  using  both  parametric 
and  non-parametric  methods.  For  the  problems  under  consideration  here, 
there  is  expected  to  be  significant  overlap  between  classes  and  the  classes 
may  be  non-convex  in  feature  space.  This  makes  it  difficult  to  accurately 
parametrize  the  class  probability  density  functions  for  the  purpose  of 
classification.  Thus,  the  use  of  nonparametric  methods  is  indicated. 

There  are  several  significant  problems  associated  with  pattern 
recognition  that  are  of  importance  in  this  research.  The  most  significant 
ones  are: 

1)  finite  data  base 

2)  unknown  class  conditional  densities 

3)  overlapping  classes 

4)  undefined  feature  space 
Finite  Data  Base 

The  effect  of  a  finite  data  base  is  very  significant  since  the 
pattern  recognition  problem  is  a  statistical  one.  It  is  important  that 
there  is  enough  data  available  for  training  and  testing  the  pattern 
recognition  algorithm  to  ensure  a  high  degree  of  confidence  in  the 
numerical  results  obtained  from  the  available  data.  The  exact  number 
of  samples  required  per  class  is  problem-dependent  and  also  depends  on 
the  way  in  which  the  data  is  acquired. 
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Unknown  Densities 

Statistical  analysis  requires  either  a  set  of  parameterized 
probability  densities  or  an  accurate  set  of  sample  statistics.  In  the 
problems  to  be  addressed  here,  parameterized  models  are  often  of  ques¬ 
tionable  validity  and  nonparametric  methods  are  more  appropriate.  A 
variety  of  methods  is  available  for  representing  the  class-conditional 
density  functions  nonparametrical ly  in  terms  of  the  actual  samples. 

We  have  selected  a  method  based  on  cumulative  marginal  distribution 
functions,  which  are  particularly  simple  to  determine. 

Overlapping  Classes 

Another  significant  problem  encountered  in  pattern  recognition 
is  the  question  of  how  to  handle  overlapping  classes.  In  Fig.  2.3.1,  two 
overlapping  classes  are  depicted  on  the  left.  The  classes  are  overlapping 


overlapping  classes 


Figure  2.3.1  Overlapping  Classes 


since  their  convex  hulls  intersect.  The  decision  rule  for  separating 
them  is  not  linear,  but  must  be  either  nonlinear  or  piecewise-1 inear. 

Undefined  Feature  Space 

There  does  not  appear  to  be  a  systematic  approach  to  feature 
extraction  for  this  problem.  One  approach  is  to  use  intuitively-derived 
features  and  these  have  much  merit  since  they  are  based  on  experience  and 
the  human  pattern  recognition  capability.  However,  the  digital  computer 
is  capable  of  performing  some  operations  of  nonlinear  transformation, 
correlation  and  manipulation  of  large  amounts  of  data  which  cannot  be 
matched  by  the  human.  This  computing  power  should  be  brought  to  bear 
in  multivariate  multiclass  problems  in  order  to  perform  data  compression 
and  feature  extraction  for  optimal  and  accurate  classification.  The 
approach  used  here  has  utilized  a  large  set  of  potential  features  which 
have  then  been  analyzed  statistically  to  determine  the  ones  with  the 
most  discriminating  power. 

2.3.2  Nonparametric  Decision  Rules 

Supervised  pattern  recognition  methods  may  be  divided  into  two 
types:  (1)  parametric,  (2)  nonparametric.  In  parametric  methods,  the  form 
of  the  underlying  probability  density  functions  are  assumed  known  and 
are  parametrized  by  a  small  number  of  parameters.  However,  in  practice 
this  assumption  is  usually  suspect.  In  particular,  all  classical  densities 
are  unimodal ,  whereas  multimodal  densities  sometimes  occur  in  practice. 

The  use  of  nonparametric  methods,  in  which  the  power  of  modern  computers 
is  more  fully  utilized,  is  suggested. 


T 
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The  nonparametric  classification  problem  may  be  stated  as 
follows.  An  n-dimensional  random  vector  x  of  observations,  or  features, 
is  assumed  to  belong  to  one  of  m  classes  ,  r^,  ....  rm  characterized 
by  a  set  of  unknown  class-conditional  probability  density  functions 
p(x|r-j),  p(x|rm),  where  p(A|B)  denotes  the  probability  of  A  given 
that  B  has  occurred.  The  m  classes  have  known  prior  probabilities 

tt  i ,  iTgf  ....  irm.  We  are  given  a  set  of  N  labeled  vectors  x  =  {x-j,  x ? . x^ 

The  Droblem  is  to  determine  a  decision  rule  on  the  basis  of  x  and  the 
labels  for  discriminating  between  the  m  classes. 

There  are  several  different  approaches  to  nonparametric  pattern 
recognition.  One  approach  is  to  estimate  the  class  conditional  density 
functions  p(x|ri)  from  the  data  using  the  methods  of  Parzen  (1962), 

Wagner  (1975)  or  Loftsgaarden  and  'JUesenberry  (1965) ,  for  example.  Another 
procedure  is  to  estimate  the  a  posteriori  class  probabilities  p(r^|x) 
directly  from  the  data.  This  aporoach  is  closely  related  to  the  k-nearest 
neighbor  decision  rules  which  do  not  compute  p(x|rl-)  but  produce  a 
decision  rule  directly.  This  latter  approach  is  closest  in  spirit  to 
the  method  used  in  this  study. 

2, 3. 2.1  k-Nearest  Neighbor  Rules 

Perhaps  the  most  popular  nonparametric  decision  rules  are  the 
k-nearest  neighbor  rules  first  investigated  by  Fix  and  Hodges  (1951). 

Training  samples  from  the  m  populations  are  gathered  together  into  a  single 
population,  with  each  member  labeled  according  to  the  class  from  which  it 
originated.  Now  suppose  it  is  desired  to  classify  a  new  point,  x,  on  the 
basis  of  the  data  in  the  training  set.  The  k  closest  points  to  x  are  found 
and  x  is  assigned  to  the  class  with  the  largest  representation  in  the  set 
of  k  points.  Fix  and  Hodges  were  able  to  show  that  this  rule  is  asymp¬ 
totically  Bayes  efficient  as  k  and  N,  the  number  of  samples  in  the  training 
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set,  approach  infinity  if 


lim  k(N)  =  » 
N  -*■ » 


lim  • 
N  -*  “> 


mi  = 

N 


0 


Cover  and  Hart  (1961)  have  investigated  this  rule  for  fixed  k 
and  derived  bounds  for  the  simplest  case  of  k =  1  (one-nearest  neighbor 
decision  rule).  The  result  is  that  the  error  rate  Pg  for  m  populations 
is  bounded  asymptotically  by 

★ 


p!  <  PQ  <  p!  (2 


) 


*  * 
where  Pg  is  the  Bayes  error  rate.  This  result  implies  that  P&  <  2Pg, 

independent  of  the  number  of  populations. 

2. 3. 2. 2  Nonparametric  Partitioning 

Partitioning  using  hierarchical  decision  trees  is  another  approach 
to  nonparametric  supervised  learning.  This  approach  has  been  employed  by 
several  workers.  Henrichon  and  Fu  (1969)  considered  the  problem  of  par¬ 
titioning  the  real  line  on  the  basis  of  labeled  samples.  They  allowed 
partitions  associated  with  each  class  to  be  disjoint.  That  is,  a  single 
class  was  not  necessarily  constrained  to  a  single  interval.  The  multi¬ 
dimensional  problem  was  handled  by  considering  each  dimension  independently 
and  then  building  up  a  total  score  on  the  basis  of  classification  in  each 
dimension.  Meisel  and  Michalopoulos  (1973)  attacked  the  problem  of  finding 
decision  trees  which  tend  to  minimize  the  average  number  of  comparisons  to 
arrive  at  a  given  decision.  They  developed  a  dynamic  programming  solution 
to  the  problem.  Unfortunately,  the  number  of  trees  which  must  be  evaluated 
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3 

is  of  the  order  of  N  /6  where  N  is  the  total  number  of  samples.  Thus, 
for  even  moderate-sized  data  bases,  dynamic  programming  solutions  are 

Q 

not  possible.  For  example,  if  N  =  1000,  then  10  /6  trees  would  have  to 
be  evaluated. 

2. 3. 2. 3  A  New  Partitioning  Rule 

The  method  used  in  this  study  is  a  new  nonparametric  partitioning 
decision  tree  rule  developed  by  Friedman  (1976).  This  method  has  been 
applied  successfully  to  electrocardiogram  classification  by  Gustafson 
et  al.  (1978)  and  appears  to  have  strong  general  merit.  It  has  the 
following  desirable  properties: 

(i)  The  procedure  is  computationally  simple  for  both  training 
and  classification  phases.  This  is  of  great  importance  in 
the  present  problem,  due  to  its  high  dimensionality. 

(ii)  The  decision  rule  that  results  from  the  training  phase  is 
asymptotically  Bayes'  risk  efficient  as  the  number  of 
training  samples  increases  without  limit. 

(iii)  Additional  features  may  be  easily  added  prior  to  or  during 
the  training  phase.  These  may  be  linear  or  nonlinear.  Non- 
informative  features  are  simply  ignored  in  the  training  phase. 
However,  any  of  the  additional  features  which  are  informative 
are  used,  and  become  part  of  the  partitioning  algorithm. 

The  severest  limitation  is  that  it  was  derived  for  the  two-class 
problem  only.  Although  it  was  suggested  that  the  M-class  problem  be 
treated  as  a  series  of  M  two-class  problems,  this  is  not  completely 
straightforward  in  practice.  Nevertheless,  the  method  was  selected  as 
the  best  overall  approach  to  the  problem. 
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For  the  purposes  of  presenting  the  two  class  problem,  we  will 
designate  the  two  classes  1  and  2.  Further,  we  will  use  the  fact  that 
«-■] tt-j  =  1^2  where  is  the  cost  of  misclassifying  class  i  and  ir.  is  the 
a  priori  probability  of  class  i.  Extensions  to  a  more  general  set  of 
cost  functions  are  straightforward  and  will  be  presented  later  in  this 
section.  We  will  also  assume  that  associated  with  both  classes  is  a  set 
of  features  { } .  In  the  problem  of  multispectral  classification,  these 
features  may  be  radiances,  brightness  temperatures,  or  their  gradients, 
ratios,  second  differences,  etc.  That  is,  the  features  may  consist  of 
any  function  of  the  observed  data  that  we  suspect  of  significance  in 
discriminating  the  classes. 

Given  a  single  feature,  x..,  of  this  set,  we  now  seek  a  simple 
decision  rule  that  partitions  the  classes  on  the  value  of  x^  That  is, 
we  desire  the  value  of  x^,  say  xT,  such  that  we  place  the  observation 
into  one  class  if  xi  <  xT  and  the  other  if  x^  >  xf  in  such  a  way  as  to 
minimize  the  probability  of  error.  For  a  given  feature,  x^  it  may  be 
shown  that  the  value  xf  is  the  value  that  maximizes  the  distance  between 
the  marginal  cumulative  distribution  functions  (CDF)  or  Kolmogorov-Smirnov 
Distance  (KS  Distance).  That  is,  if  F^(x.)  is  the  marginal  CDF  of  Class  1 
in  x.  space  and  ( x^. )  the  CDF  of  Class  2  in  x^  space,  xT  maximizes  the 
quantity: 

D1(x1)  =  | F 1 ( x . )  -  F2(x.)|  (2.3.1) 

where  D^x^)  is  the  KS  Distance.  To  show  that  this  is  the  appropriate  decision 
point,  let  us  assume  that  F1 (x^ )  >  F2(x. )  for  all  x^  and  that  the  decision  rule 
is  based  on  some  point  x,!.  This  rule  will  be  to  choose  Class  1  if  x^  <  and 
choose  Class  2  if  x.  >  x!.  Then  the  expected  cost  of  a  decision  is: 
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C.  =  ^>1  Tr-j  Pr{x.  >  xj  |  Cl  ass  1}  +  1^2  Pr  ( x-|  I  Class  2} 


Since  it  is  assumed  that  =  *,2112  =  k  this  cost  is 


(2.3.2) 


Ci  =  k C(1  -  F1  (x'.))  +  F2(x!)] 


=  k[l  -  D.(x!)] 

The  cost  is  thus  minimal  at  the  xj  that  maximizes  D^  (x..). 


(2.3.3) 


If  a  number  of  features  x^  exist,  the  optimum  choice  of  a  feature 
on  which  to  partition  the  data  is  the  one  that  yields  the  minimum  cost  C^. 

By  the  preceding  analysis,  this  feature  is  the  one  with  the  maximum  D^xt). 

Having  partitioned  the  original  set  6f  observations  into  two  new 
sets  on  the  basis  of  one  feature,  it  is  now  possible  to  reapply  this  parti¬ 
tioning  rule  to  each  of  the  new  sets  if  further  discrimination  is  warranted. 

The  decision  as  to  what  constitutes  a  terminal  node  is  somewhat 
arbitrary.  Since  the  actual  CDF's  of  the  features  will  be  unknown  in 
general,  the  algorithm  must  rely  on  sample  CDF's  to  perform  the  partitions. 

The  confidence  intervals  attached  to  such  sample  statistics  are  a  function 
of  the  number  of  observations  entering  into  their  computation.  The  defini¬ 
tion  of  a  terminal  node  has  been  historically  taken  to  be  a  node  that  contains 
only  members  of  a  single  class  or  a  node  that  has  too  few  observations  to 
allow  the  CDF's  to  be  meaningful. 

The  label  on  a  terminal  node  or  leaf  is  given  by  the  minimum  cost 
decision  at  that  node.  For  each  class,  there  is  a  probability  that  a  sample 

will  arrive  at  the  leaf  j.  The  cost  of  labeling  a  leaf  class  1  is  thus: 


Ctdecide  1  at  node  j)  -  r' j) 

and  the  cost  of  labeling  a  leaf  class  2  is 

Cldectde  2  at  node  J)  -  ‘jf^ 
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The  node  is  labeled  with  the  class  that  gives  the  lower  cost. 

For  the  purposes  of  the  study  undertaken  in  this  section,  the 
terminal  nodes  were  defined  by  one  of  two  criteria.  A  node  was  considered 
terminal  if  either: 

i)  only  one  class  appears  at  a  node  or 
ii)  the  maximum  distance  D^xt)  does  not  reject  the  null 

hypothesis  of  the  Kolmogorov-Smirnov  two  sided -two  sample 
test  at  greater  than  the  90  percent  confidence  level. 

The  first  criterion  is  an  obvious  one.  The  second,  however,  warrants 
some  further  discussion. 

The  Kolmogorov-Smirnov  two  sided -two  sample  test  is  a  powerful 
yet  simple  nonparametric  test  for  hypothesis  that  two  samples  are  drawn 
from  different  populations  (e.g.,  Conover,  1971).  The  test  is  based  on 
order  statistics  and  uses  the  maximum  KS  Distance  D^(xt)  as  its  sufficient 
statistic.  Since  this  statistic  is  an  integral  part  of  the  tree  building 
algorithm,  the  test  is  easily  implemented.  The  null  hypothesis  of  the 
test  is  that  the  two  samples  are  drawn  from  the  same  population.  This 
hypothesis  is  rejected  if  the  D^(xt)  is  greater  than  some  value  that  depends 
on  the  sample  sizes  and  the  required  confidence.  The  rationale  of  using 
this  test  as  a  test  for  a  terminal  node  is  that  if  the  algorithm  is  going 
to  partition  the  classes  with  regard  to  some  feature,  the  difference 
between  the  populations  along  that  feature  should  be  real  to  some  ascribable 
confidence. 

One  important  quality  of  the  Friedman  Tree  algorithm  is  its  invariance 
under  monotone  transformations  of  the  features.  That  is,  the  features  used 
to  partition  the  classes  at  each  node  in  the  tree  will  remain  the  same  no 


matter  what  monotone  transformation  is  applied  to  them.  In  cases  such  as 
the  one  at  hand  in  which  the  features  are  positive  numbers,  this  means  that 

the  tree  produced  will  be  the  same  for  the  choice  of  a  basic  feature  x.  as 

2  3  * . 

it  will  for  the  choice  of  x..,  x.,  log(x.j),  1/x^  l/(l+x.),  e  ,  etc. 

as  basic  features.  The  only  difference  between  the  various  trees  will  be 

the  value  of  the  cut  at  each  node. 

For  the  more  general  case  of  f  ^^z*  t^ie  a^9°rithm  must  be 

modified  slightly.  Since  equation  2.3.2  does  not  reduce  to  equation  2.3.3, 

equation  2.3.2  must  be  used  directly.  That  is,  for  a  given  feature  i,  the 

choice  of  a  point  on  which  to  partition  is  the  one  that  yields  the  minimum 

cost: 

C.  =  Jl-j (1  "  F-|  (x! ) )  +  &2Tt2  F 2 ( x )  (2.3.4) 

and  the  optimum  feature  is  the  one  with  the  minimum  .  The  balance  of 
the  algorithm  including  the  terminal  node  definition  remains  as  before. 

For  the  M-class  discrimination  problem,  it  is  possible  to  use  the 
Friedman  Tree  approach  by  considering  this  to  be  M  two  class  problems.  That 
is,  for  each  of  the  M  classes  a  tree  is  built  with  a  specific  class  being 
class  1  and  the  remaining  classes  being  class  2.  Each  leaf  of  the  various 
trees  are  labeled  with  the  cost  of  deciding  that  the  measurement  was  a 
member  of  the  class  1  of  the  respective  tree.  A  measurement  is  then 
identified  with  the  class  whose  tree  has  the  lowest  class  1  leaf  cost. 
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2.4  Development  of  Inversion  Algorithm  for  Continuous-Valued  Parameters 

In  this  section,  the  methods  of  Section  2.2  are  specialized 
to  the  problem  of  determining  continuous-valued  parameters  under  certain 
important  assumptions  needed  to  solve  for  the  meteorological  parameters 
of  interest.  We  have  considered  the  following  inversion  methods: 

(1 )  1  inear  regression 

(2)  nonlinear  regression 

(3)  extended  Kalman  Filter 

(4)  iterated  extended  Kalman  Filter 

(5)  modal  estimator 

The  first  two  methods,  based  on  regression  models,  are  distinctly 
different  in  philosophy  than  the  last  three  methods,  which  are  based  on 
Bayesian  statistical  inference.  These  two  philosophies  are  now  discussed 
and  compared. 

2.4.1  Regression  Methods 

The  use  of  regression  analysis  is  a  powerful  tool  in  analyzing 
the  relationship  between  sets  of  variables  of  different  types.  In  the 
present  problem,  linear  and  nonlinear  regression  models  for  solving  the 
inversion  problem  have  been  developed. 

In  computing  regression  models,  it  is  important  to  determine 
the  effect  of  using  different  subsets  of  independent  variables.  It  is 
desirable  to  use  only  those  variables  which  significantly  reduce  the  regression 
errors.  Overfitting  can  lead  to  models  which  are  overly  sensitive  to  noise.  Step¬ 
wise  regression  provides  a  partial  automation  of  the  variable  selection 


process.  It  is  based  on  a  techrtkjue  which  in  the  process  of  computing 

•  \ 

an  ordinary  regression  on  m  predictors' obtains,  at  essentially  no  extra 
expense,  m  intermediate  regressions  which  are  usefulin  determining 
functional  relationships  between  the  dependent  variable  and  several- 
selected  subsets  of  the  total  set  of  independent  variables,  or  predictors. 

In  the  simplest  case,  one  variable  is  added  at  each  step.  There  are 
several  meaningful  statistical  criteria  which  may  be  used  to  determine 
which  predictor  variable  to  add  to  the  model: 

(i)  add  the  predictor  whose  F-to-enter  statistic  has  the 
largest  value1, 

(ii)  add  the  oredictor  that  gives  the  greatest  decrease  in 
the  residual  sum  of  squares; 

(iii)  add  the  predictor  which  gives  the  greatest  increase 

in  the  multiple  correlation  between  the  dependent  variable 
and  the  predictors. 

The  F-to-enter  statistic  for  a  predictor  is  the  F-statistic  for 
testing  the  significance  of  the  regression  coefficient  the  predictor  would 
have  if  it  were  added.  All  three  of  these  criteria  are,  in  fact  mathe¬ 
matically  equivalent;  however  the  F-to-enter  statistic  was  used  in  programs 
utilized  in  this  study. 

In  addition  to  adding  variables  it  is  also  desirable  to  provide  for 
removal  of  variables.  It  may  happen  that  recently  added  variables  may  in 
combination  render  a  variable  entered  at  an  earlier  state  statistically 
insignificant.  Removal  of  variables  is  based  on  an  F-to-remove  test. 
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In  the  stepwise  regression  procedure,  the  test  for  removal  of  predictors 
is  first  made  for  each  predictor  already  in  the  model.  If  a  predictor  is 
removed,  the  program  proceeds  to  the  next  step.  If  no  predictor  is  removed, 
the  step  continues  with  the  addition  of  a  predictor.  The  stepwise  regres¬ 
sion  procedure  terminates  when  no  predictor  is  either  deleted  or  added  at  a 
step.  The  details  of  the  stepwise  regression  procedure  may  be  found,  for 
example,  in  Draper  and  Smith  (1966), 

The  regression  method  used  throughout  this  study  was  performed 
by  the  program  RLSEP,  which  was  available  at  AFGL  as  part  of  the  Inter¬ 
national  Mathematical  Statistics  Library  (IMSL), 

It  should  be  pointed  out  that  the  statistical  D  method  which  has 
been  used  in  several  meteorological  parameter  inversion  studies,  is  a 
regression  technique. 


In  order  to  compute  the  conditional  density  p(x|y)  we  need: 

(a)  the  priori  density  p(x) 

(b)  the  conditional  density  p(y|x),  which  can  be  computed  from 
the  model  y  =  h(x)  +  v  if  the  density  of  v  is  known. 

Given  p(x|y)  we  have  considered  two  different  types  of  estimators: 
(1)  minimum  variance,  (2)  maximum  likelihood. 


Maximum  Likelihood  Estimate 

The  most  likely  value  of  x  is  given  by  the  mode 

x  1  =  arg  max  p(x|y)  (2.4.3) 

xenx 

and  is  sometimes  referred  to  as  the  maximum  likelihood  Bayes  estimate. 

A  A 

xmi  can,  in  principle,  be  found  more  easily  than  xmv>  since  integration 
over  a  is  not  required.  However,  for  nonlinear  problems,  exhaustive 

X 

search  techniques  may  have  to  be  used  to  find  the  global  maximum  of  p(x|y). 

2.4.3  Comparison  of  Bayesian  and  Regression  Methods 

It  is  instructive  to  compare  Bayesian  and  regression  methods  to 
assess  the  fundamental  differences.  This  is  done  in  Tables  2.4.1  and  2.4.2. 
Note  that  both  methods  require  a  model  of  the  system.  The  Bayesian  approach  uses 
a  forward  model  which  is  based  on  the  physics  of  the  problem  (i.e.,  a  model 
to  predict  intensity  or  brightness  temperature  is  utilized).  Additive 


noise  is  included  to  account  for  two  effects: 

(1)  sensor  noise; 

(2)  modeling  errors  due  to  uncertainty  in  the  function  h(«)- 
Once  the  forward  model  is  known,  estimates  are  found  as  a  function  of 
the  measurement  y,  the  mean  and  covariance  of  the  prior  estimate  (m,  P  ) 
and  the  covariance  of  measurement  noise  (V). 

In  contrast,  regression  methods  use  an  inverse  model,  which  is 
not  physically-based,  but  rather  must  be  generated  statistically  from 
data.  Noise  may  be  included  in  the  model,  as  with  the  statistical-D 
method.  The  Bayesian  technique  has  advantages  in  handling  missing  data 
and  checking  for  bad  data.  Since  the  covariance  of  the  measurement 
residuals  is  known,  (c.f.  Eq.  (2.2.7)),  it  is  simple  to  derive  statistical 
outlier  tests  to  throw  out  bad  data. 

Another  significant  advantage  of  the  Bayesian  approach  would 
accrue  during  on-line  data  processing  in  which  parameters  are  estimated, 
or  tracked,  over  time.  The  Bayesian  estimators  we  propose  to  use  not 
only  provide  estimates  of  the  desired  parameters  but  also  provide  estimates 
of  the  covariances  of  estimation  errors,  since  the  full  estimation  error 
covariance  matrix  is  computed  as  part  of  the  Kalman  Filtering  process. 

This  matrix  is  used  to  optimally  weight  the  information  coming  in 
via  the  measurements  relative  to  the  information  contained  in  all  past 


measurements. 
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2.4.4  Development  of  Bayesian  Inversion  Methods 
2. 4. 4.1  Extended  Kalman  Filter 

The  state,  x,  of  interest  contains  all  of  the  parameters 
we  wish  to  estimate.  It  is  assumed  to  have  mean  m  and  covariance  matrix 
P  .  The  measurement  is  of  the  form 

y  =  h(x)  +  v  (2.4.4) 

with  v  a  zero-mean  random  variable  with  covariance  matrix  V,  independent 
of  x. 

The  form  of  the  estimator  is 

x  =  m  +  M  [y  -  y]  (2.4.5) 

where  y  =  h(m)  is  the  predicted  value  of  y  and  v  =  y-y  is  the  measurement 
residual,  the  difference  between  the  actual  and  expected  measurement.  If 
y  =  h(m)  there  is  no  information  contained  in  y  relative  to  m  and  x  =  m. 

The  matrix  M,  referred  to  as  the  Kalman  gain  matrix,  is  to  be  found  to 
yield  the  best  performance. 

Defining  the  estimation  error 

e  =  x-x  (2.4.6) 

we  have  the  error  covariance  matrix 

P  =  E[e  eT]  (2.4.7) 

where  £[•]  is  the  ensemble  average  of  [•]  and  (*)T  is  the  transpose  of  (•)• 
The  objective  of  a  minimum-variance  estimator  is  to  choose  M  such 
that  uTPu  is  minimized  for  any  real  non-zero  vector  u  of  the  same  dimension 
as  x.  If  h(*)  is  linear,  the  solution  is  the  classical  Kalman  Filter  (1960), 
given  in  Eqs.  (2.2.4)  -  (2.2.8).  For  nonlinear  h(*)»  we  must  resort  to 
approximate  solutions. 
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The  extended  Kalman  Filter  is  one  approach  to  nonlinear  problems 
and  is  based  on  a  linearized  measurement  equation.  Thus  we  assume 


h(x)  =  h(m)  + 


(x-m) 


x=m 


which  gives 

y  =  h(m)  +  H(m)(x-m)  +  v 

where 

3h(x) 


H(m)  = 


3X 


x=m 


If  we  define  a  new  measurement 


then 


y  =  y  -  h(m)  -  H(m) m 


y  =  H(m)  x  +  v 


(2.4.8) 

(2.4.9) 

(2.4.10) 

(2.4.11) 

(2.4.12) 


y  =  H(m) m  (2.4.13) 

and  the  measurement  y  is  linear  in  x.  Applying  the  Kalman  Filter  equations 
[cf.  (2.2.4)  -  2.2.8)]  to  this  problem  gives  the  following  solution 

x  =  m  +  M*  v  (2.4.14) 

\>  =  y  -  h(m)  (2.4.15) 

(2.4.16) 


M*  =  PQ  H(m)T  e"1 


T.  =  H(m)  PQ  H(m)  +  V 

P  =  P  -  M*  H(m)  P 
o  o 


(2.4.17) 

(2.4.18) 


where  M*  is  the  optimal  gain  matrix,  I  is  the  estimated  covariance  matrix 
for  the  residuals  v  and  P  is  the  estimated  covariance  matrix  for  the 
a  posteriori  estimation  errors. 
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This  estimator  will  perform  reasonably  well  in  problems  which 
are  not  highly  nonlinear.  The  effect  of  the  nonlinearity  can  be  explic¬ 
itly  demonstrated  by  computing  h(0  to  second  order  in  (2.4.8).  Let 
e  =  x-x,  eQ  =  x-m  and  assume  y  is  a  scalar  for  simplicity.  Then,  to 
second  order 

e  =  eQ  +  M  [-H(m)  eQ  +  ^  e^  S(m)  eQ  +  v]  (2.4.19) 


where 

Then,  defining  P+  to  be  the  actual  covariance  matrix  of  errors 
P+  =  E[e  eT] 


(2.4.20) 


=  [I  -  M  H(m)]  P  [I  -  M  H(m)]T 

=  M  [V  +  1  Ef(eJ  S(m)  eQ)2}]  MT  (2.4.21) 


It  can  be  shown  that  if  S(m)  =  0  and  M  =  M*,  then  P+  =  P.  Equation  (2.4.21), 

T  2 

good  to  second  order  for  any  M,  shows  explicitly  that  E { eQ  S(m)  eQ)  } 
must  be  small  compared  to  V  for  the  extended  Kalman  Filter  to  be  a  good 
estimator.  If  we  write 

£  =  z  +  E{(eJ  S(m)  eQ)2}  (2.4.22) 

and  use  %  in  place  of  z  in  (2.4.17),  then  M*  and  P  would  be  better 
approximations  and  the  filter  would  be  expected  to  perform  with  more 
accuracy. 

An  approach  which  has  generally  been  found  to  yield  better  results 
in  practice  was  developed  by  Denham  and  Pines  (1966)  and  is  called  the 
Iterated  Extended  Kalman  Filter. 
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2.4.4. 2  Iterated  Extended  Kalman  Filter 

In  equation  (2.4,21),  it  was  shown  how  nonlinearity  can  affect 
the  Extended  Kalman  Filter,  A  major  reduction  in  estimation  error  can 
be  made  by  a  simple  modification  of  the  Extended  Kalman  Filter,  The  modi¬ 
fication  consists  of  writing  y  in  (2.4.5)  as 
y  =  h(m) 

=  h(x)  +  H(x)  (m-x)  (2.4,23) 

A  A 

for  any  estimate  x.  For  nonlinear  h(*)  it  is  expected  that  y-y  will  be 
smaller  using  (2.4,23)  for  y  rather  than  using  y  =  h(m),  especially  if  the 
nonlinearities  are  significant.  This  observation  provides  the  motivation 


for  the  following  iteration: 

(i )  Set  =  m 

(ii)  Compute 

3^k+1)  =  m  +  v<k>  (2.4.24) 

v(k)  =  y  .  h(f^k))  -  H(B(k))  (m  -  B(k))  (2.4.25) 

H(6(k))  =  _9hM|  (k)  (2.4.26) 

x=e 

=  P0  H(f^kV  [z^h'1  (2.4.27) 

z(k)  =  H ( 0 ^ k ^  PQ  H(b^)T  +  v  (2.4.28) 

(iii)  Compute 

n(k)  =  II  6(kt’>  -  6(k)l| 

for  some  norm  ||*|| 


(kl 

(iv)  If  nv  ;  >  e ,  return  to  (ii);  otherwise  continue 


f 


(2.4.29) 


(v)  Compute  new  state  estimate  as 

x  . 

(vi)  Update  estimation  error  covariance  matrix 

P  =  p0  -  M^k+1^  H(e^k+1^)  PQ  (2.4.30) 

The  effect  of  using  this  scheme  can  be  evaluated  by  comparing  the 
estimation  error  covariances  for  the  iterated  filter  to  that  of  the 
non-iterated  filter  (eq.  (2.4.21)),  Again  we  assume  a  scalar  measurement. 
Then,  to  second  order 

e(k+1>  =x-  3(k+1) 

=  eQ  +  M(k)  [-H(e(k))  eQ  +  ^(e(k))T  S(e(k))  e(k)  +  v] 

(2,4.31) 

Comparing  this  with  (2.4.19),  which  is  the  Extended  Kalman  Filter  results 

shows  that  a  significant  reduction  in  the  nonlinear  term  has  been  made, 

(k) 

due  to  the  fact  that  e'  '  will  almost  always  be  smaller  than  eQ  if  the 
algorithm  converges. 

2. 4. 4. 3  Estimation  of  the  Mode 

From  (2.4,1)  and  (2.4,3),  the  mode  of  the  a  posteriori  density 
p(x|y)  is  given  by 

x  =  arg  max  p(y|x)  p(x)  (2,4.32) 

xefix 

and  is  equivalent  to  the  Bayes  maximum  likelihood  estimate.  In  order  to 
find  x^,  we  need  to  specify  the  probability  densities  of  x  and  the  noise  v. 
We  assume  x  to  be  normally  distributed  with  mean  m  and  covariance  matrix  PQ 
and  assume  v  to  be  normally  distributed  with  zero  mean  and  covariance 
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matrix  V.  Thus,  if  x  is  of  dimension  n  and  y  is  of  dimension  q: 

P(x)  =  - =77 -  exp  |  -  i  (x  -  m)T  P'1  (x-m)l  (2.4,33) 

(2^)n/2|PQ|  l  2  0  i 


P(y  I  x)  = - =77 -  expj  -  i  (y- h(x))T  V"1  (y-h(x))}  (2.4.34) 

(2ir)q/^  |  V  |  l  2  J 


where  |(*)|  denotes  the  determinant  of  (•).  Neglecting  constant  terms 
and  noting  that  the  logarithm  is  a  monotone  function,  we  see  that  (2.4.32) 
is  equivalent  to 

x  =  arg  max  <j>(x)  (2,4,35) 

Xefix 

where 

<j)(x)  =  (x  -  m)T  Pq1  (x-m)  +  (y-h(x))T  V"1  (y-h(x))  (2.4,36) 

At  the  mode 


or 


34>(x) 

9X 


„  =  0 

x=xm 

m 


x  =  m  +  P  H(x  )T  V"1  [y-  h(x  )] 
m  o  m  w  m 


(2.4.37) 


(2.4.38) 


where 


H(i  )  = 

v  m;  9x 


(2.4.39) 


x=x_ 


Equation  (2.4.38)  can  be  solved,  in  principle,  in  a  variety  of  ways. 
However,  we  note  here  that  the  appropriate  method  to  be  used  may  depend 
upon  the  functional  form  of  h(>)  as  well  as  the  value  of  the  initial  guess. 


III.  INVERSIONS  USING  MICROWAVE  DATA  FOR 


CONTINUOUS-VALUED  PARAMETERS 

3.1  Introduction 

It  is  convenient  to  assemble  the  parameters  we  wish  to  estimate 
together  into  a  single  state  vector  x: 

cloud  parameters 
x  =  surface  parameters 

_ atmospheric  parameters 

We  are  given  an  a  priori  estimate  of  x  in  the  form 

Xq  =  f(latitude,  season,  forecasts,  etc.) 

=  f(a) 

Here  a  caret  (*)  denotes  an  estimate,  f(*)  is  a  known  function  obtained 
from  a  priori  data  analysis  and  a  is  used  to  represent  a  vector  of 
parameters  used  to  predict  x  prior  to  incorporating  the  microwave  data. 

The  next  step  is  to  process  the  microwave  data  to  find  a  better 
estimate  of  x.  Assume  that  a  total  of  m  microwave  channels  are  used  and 
that  all  data  is  obtained  simultaneously.  If  the  measurements  are  collected 
into  a  m-dimensional  vector  y  then  the  problem  is  to  find  a  "best"  esti¬ 
mator  for  x  of  the  form 

x  -  g(xQ,y) 

Before  presenting  the  problem  solutions  which  have  been  investi¬ 
gated,  we  will  give  a  brief  description  of  the  method  of  approach. 
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3.2  Method  of  Approach 

The  method  of  approach  which  has  been  used  for  inversion  of  microwave 
data  may  be  summarized  as  follows: 


(a)  Generate  Data  Base  Using  Unconditional  Statistics  of  the 
State 


(b)  Forward  Simulation 


{XpX2*  ’  '  '  *  SELECTED  to  account  for  expected 

STATISTICAL  VARIATIONS  IN  X 


FIND  h(')  TO  MINIMIZE  SELECTED  NORM  ON  (Ev  ,Ev 

Y1  Y2 


GENERALLY/  USE  LEAST  SQUARES. 


(d)  Develop  Inversion  Techniques 


a  x0 


INVERSION 

METHOD 


X. 

1 


X. 


0. 


->Ex, 


Ev  =  INVERSION  ERROR 
Xi 


(e)  Evaluation 

•  ANALYZE  MODELING  ERRORS  f E„  } 

'  i 

•  ANALYZE  INVERSION  ERRORS  (Ev  > 

r\  , 

1 

•  DESIGN  OF  TRAINING/TEST  SETS 

•  EFFECT  OF  MISSING  DATA 

3.2.1  Data  Base 

The  data  bases  were  constructed  so  as  to  cover  the  expected  variations 
in  the  independent  variables.  The  independent  data  were  determined  on  the 
basis  of  discussions  with  ERT  and  AFGL  personnel.  Three  different  data  bases 
were  used,  which  are  described  in  the  sequel.  The  basic  parameters  of 
interest  are  given  in  Table  3.2.1. 

3.2.2  Forward  Model  Computation 

Each  data  base  was  used  to  generate  a  set  of  forward  models  for  pre¬ 
dicting  microwave  brightness  temperatures,  in  ®K,  from  the  independent 
variables.  Four  different  channels  were  run:  10.7  GHz,  19  GHz,  37  GHz, 
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TABLE  3.2.1 
Inversion  Parameters 


LWC  -  integrated  liquid  water  content  within 
2 

(milligram/cm  ) 


WS  =  wind  speed  over  the  ocean  surface 
(m/sec) 


WST 


3  ;  WS  <  8 

WS  ;  8<  WS  <  22.5 
22.5  ;  WS  >  22.5 


DLT  =  atmospheric  temperature  deviation  (°K) 
at  all  altitudes 


CT  =  cloud  thickness 

(hundreds  of  meters) 

CH  =  cloud  top  height 

(hundreds  of  meters) 

WV  =  integrated  atmospheric  water  vapor 
(gm/cm2) 

R  =  rainfall  rate 
(mm/hr) 


water  clouds 


assumed  constant 
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and  94  GHz  using  both  linear  and  nonlinear  regression.  The  best  models  are 
generally  quite  nonlinear.  The  F  ratios,  goodness-of-fit  (R  )  and  rms 
regression  error  (a)  were  evaluated  to  assess  the  characteristics  of  the 
regression  models. 

3.2.3  Inversion  Techniques 

The  different  types  of  inversion  techniques  investigated  during  the 
study  have  been  discussed  in  Section  II.  There  are  two  basic  types  of 
techniques:  (1)  regression,  (2)  Bayesian.  These  were  compared  in  Table  2.4.2 
and  it  was  shown  that  Bayesian  techniques  offer  more  flexibility  in  data 
handling  and  modeling  than  regression  techniques  do. 

The  emphasis  in  the  present  work  is  on  Bayesian  analysis.  The  basic 
ideas  employed  in  the  Bayesian  approach  have  been  outlined  in  Section  2.4. 

The  basic  approach  has  been  utilized  to  determine  several  different  types 
of  estimates.  Minimum  variance  estimates  are  desirable  to  achieve,  but  in 
practice  are  very  difficult  to  compute  for  nonlinear  problems.  No  general 
methods  presently  exist  for  computation  of  the  conditional  mean  in  non¬ 
linear  problems.  For  this  reason,  it  is  common  in  practice  to  rely  on 
other  types  of  estimates.  The  maximum  likelihood  estimate  can  be  found  in 
principle  by  solving  a  nonlinear  programming  problem-,  i.e.,  one  can  actually 
compute  maximum  likelihood  estimates.  However,  the  disadvantage  of  these 
estimates  is  that  no  prior  information  is  used.  This  is  an  important 
consideration  in  the  present  problem,  since  a  priori  estimates  carry  a 
significant  amount  of  information  relative  to  the  desired  estimates.  The 
mode  of  the  a  posteriori  density  is  an  attractive  estimate,  since  it  contains 
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the  a^  priori  information  and  can  be  computed  in  practice,  subject  to  certain 
moderate  restrictions.  The  iterated-extended  Kalman  filter  described  in  the 
sequel  can  be  viewed  as  an  approximate  modal  estimator  for  the  a^  posteriori 
density. 

3 . 3  Inversions  for  Data  Set  #1 
3.3.1  Data  Base 

The  first  data  set  used  for  numerical  studies  is  summarized  in 
Table  3.3.1  and  consisted  of  803  simulations  using  the  GABTAWF  program. 

The  data  in  the  table  were  used  to  evaluate  the  performance  of  the  inversion 
techniques.  In  order  to  achieve  better  regression  model  fits  in  the 
presence  of  errors,  a  total  of  60  additional  cases  were  utilized  in  the 
fitting  set  which  consisted  of  rarely-occurring  cases  in  which  the  clouds 


had  an  unusually  high 

liquid  water 

content. 

This 

also  lends  to  reduce 

the  fit  error  at  outlying  points  in 

the  test 

set. 

The  additional  cases 

consisted  of  ten 

values  of  LWC  (.286,  .312,., 

■  »  »  • 

520)  with  the  following 

six  conditions  on 

i  the 

other  variables: 

WST 

.  DLT 

rr 

CH 

W  V 

1. 

8 

0 

25 

49 

1.932 

2. 

8 

0 

25 

49 

1 .375 

3. 

20 

0 

25 

49 

1.375 

4. 

8 

0 

25 

49 

2.442 

5. 

20 

0 

25 

49 

2.442 

6. 

20 

0 

25 

49 

1 .932 

All  data  were  run  for  Midlatitude  Spring  conditions  over  the  ocean 
with  the  sun  overhead.  A  summary  of  the  estimated  parameters  and  tech¬ 
niques  to  be  evaluated  are  given  in  Table  3.3.2. 


TABLE  3.3.1(b) 

Independent  Data  Used  for  Numerical 
Studies  -  Data  Base  #1 


Integrated  Liquid  Water  ( 

Content 

2 

gm/cm 

# 

.006, (.006), .120* 

600 

.0043, (.0043), .086 

40 

.0076, (.0076), .152 

40 

.026, (.026), .260 

60 

0 

45 

.009 

2 

.0165 

4 

.0240 

4 

.0315 

4 

.0390 

4 

total 

803 

a,(b),c  denotes  values  starting  at  a  with 
increment  b  and  final  value  c 
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TABLE  3.3.2 

DATA  FOR  NUMERICAL  EXPERIMENTS 


MIDLATITUDE  SPRING 

SUN  OVERHEAD 
OVER  SEA 

STATE: 


"INTEGRATED  LIQUID  WATER  ~ 

~  LWC " 

CLOUD  THICKNESS 

CT 

CLOUD  TOP  HEIGHT 

CH 

SURFACE  WIND  SPEED 

WST 

ATMOSPHERIC  TEMPERATURE 

DLT 

»  INTEGRATED  WATER  VAPOR 

_WV  . 

WST  =  TRUNCATED  WIND  SPEED 
(WST  >  8  M/SEC) 

DLT  =  CONSTANT  TEMPERATURE  VARIATION  AT  ALL  HEIGHTS 
- >  ERROR  =  TRUE  -  ESTIMATED 

Estimates: 

OL  *  OPEN  LOOP  (USE  A  PRIORI  MEAN) 

L  *  LINEAR  REGRESSION 

NL  *  NONLINEAR  REGRESSION 

EKF  =  EXTENDED  KALMAN  FILTER 

IEKF  *  ITERATED  EXTENDED  KALMAN  FILTER 
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3.3.2  Forward  Models 

The  forward  regression  models  for  three  microwave  channels  (19,  37, 

94  GHz)  are  shown  in  Tables  3.3.3  -  3.3.5  and  are  seen  to  be  highly  nonlinear. 
The  19  GHz  temperature  predictors  of  most  significance  are  integrated  liquid 
water  and  water  vapor.  Less  significant  predictors  are  wind  speed  (i.e., 
foam  coverage),  cloud  thickness  -  height  product  and  the  product  of  cloud 
height  and  integrated  liquid  water  content.  The  most  significant  predictor 
for  the  37  CHz  channel  is  the  product  of  integrated  liquid  water  content  and 
wind  speed.  Water  vapor  is  seen  to  have  little  effect.  At  94  GHz,  the 
square  of  integrated  liquid  water  content,  alone  and  in  combination  with 
cloud  thickness,  is  seen  to  be  a  strong  predictor.  Other  combinations  of 
these  two  variables  are  seen  to  be  of  significance. 

3.3.3  Inversion  Results 

Regression  models  for  direct  inversion  for  the  parameters  are  given  in 

Tables  3.3.6  -  3.3.10  and  are  seen  to  be  highly  nonlinear.  In  the  tables, 

m.  is  the  dependent  mean  and  a  .  is  the  rms  variation  of  the  parameter 
d  prior 

over  the  863  cases  in  the  data  base. 

The  inversion  errors  using  two  different  regression  models  are 
summarized  in  Table  3.3.11.  A  16-th  order  model  for  LWC  yields  somewhat 
better  results  than  an  11-th  order  model  (Table  3.3.6);  however,  the  results 
for  the  other  parameters  are  almost  identical. 

Inversion  errors  for  the  extended  and  iterated  extended  Kalman  Filter 
are  shown  in  Table  3.3.12.  The  parameter  or  is  the  rms  channel  noise,  in  °K, 
utilized  in  the  filter.  Thus  the  measurement  covariance  matrix  V  is  either 
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TABLE  3,3.3 

FORWARD  MODEL  FOR  PREDICTING  19  GHz  TEMP, 
(units  given  in  table  3,2,1) 

T19 

REGRESSION  VARIABLES 

coeff,  F 


1 

LWC 

.16 

5479 

2 

WST 

-2.9 

1412 

3  ! 

DLT 

-.27 

208 

A 

CT 

,08 

49 

5 

CH 

-.016 

12 

6 

WV 

6.1 

5414 

7 

CT-CT 

-12  E-5 

935 

8 

CT-CH 

-20  E-4 

1516 

9 

LWC  *  CT 

-21  E-4 

427 

10 

LWC  *  CH 

25  E-4 

1452 

11 

LWC/CT 

0 

0 

12 

WST -WST 

.165 

355 

13 

CT-CH 

0 

0 

14 

WV-WV 

0 

0 

15 

LWC* LWC -CT 

0 

0 

16 

LWC  *  LWC  *  CH 

0 

0 

R2  =  .9976 
«  -  0.93"c 

INTERCEPT:  135,7 
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TABLE  3.3,4 

FORWARD  MODEL  FOR  PREDICTING  37  GHz  TEMP. 
(units  given  in  table  3.2.1) 

T37 


COEFF. 


1 

LWC 

.53 

2 

WST 

-2.2 

3 

DLT 

-.46 

4 

CT 

0 

5 

CH 

-.036 

6 

WV 

3,19 

7 

LWC *LWC 

-65  E-5 

8 

LWC -WST 

-38  E-4 

9 

LWC  *  CT 

-.016 

10 

LWC • CH 

68  E-4 

11 

LWC/CT 

-.11 

12 

WST  *  WST 

.13 

13 

CT-CH 

.011 

14 

WV-WV 

0 

15 

LWC- LWC -CT 

64  E-6 

16 

LWC- LWC -CH 

-29  E-6 

R2  -  ,996 
0  =  1.62'c 

INTERCEPT:  158.9 


F 

634 

255 

198 

0 

9.8 

473 

42 

1639 

664 

275 

4.6 

754 

512 

0 

269 

78 
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TABLE  3,3,5 

FORWARD  MODEL  FOR  PREDICTING  94  GHz  TEMP. 
(units  given  in  table  3.2.1) 

T94 

REGRESSION  VARIABLES 


COEFF. 

F 

1 

LWC 

1.27 

771 

2 

WST 

.393 

207 

3 

DLT 

.737 

| 

110 

4 

CT 

1.43 

978 

5 

CH 

.092 

16 

6 

WV 

11,6 

36 

7 

LWC -LWC 

-54  E-4 

598 

8 

LWC -WST 

-15  E-4 

62 

9 

LWC*  CT 

-.043 

1095 

10 

LWC*  CH 

-35  E-4 

15 

11 

LWC/CT 

-,2b 

5 

12 

WST*  WST 

0 

0 

13 

CT-CH 

0 

0 

14 

WV-WV 

-2,3 

16 

15 

LWC* LWC -CT 

26  E-5 

946 

16 

LWC- LWC -CH 

-22  E-6 

10 

R2 

=  ,943 

o  =  3,4°c 
intercept:  314.0 


TABLE  3.3.6 

REGRESSION  MODEL  FOR  LWC  (mill i grams/cm2) 
(defined  with  863  points) 


variable 

coefficient 

F  value 

T19 

-  85.4 

118. 

T37 

-  79.8 

395. 

T94 

81.2 

126. 

1./T19 

76.8  E+04 

15.6 

1./T37 

-  12.5  E+05 

56.6 

1./T94 

0. 

0. 

T19.T37 

0.411 

1265. 

T19.T94 

-  0.224 

75.7 

T37.T94 

0. 

0. 

T19/T37 

12.3  E+03 

123. 

T19/T94 

0. 

0. 

T37/T19 

4723 

82.6 

T37/T94 

1273. 

23.2 

T94/T19 

-6342. 

127. 

T94/T37 

0. 

0. 

dependent  mean  88. 

R2  .990 

intercept  estimate  -224. 
error  mean  -2.097  E-03 
error  standard  deviation  9.98 
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TABLE  3.3,7 

REGRESSION  MODEL  FOR  WST  (m/sec) 
(defined  with  863  points) 


variable 

coefficient 

F  value 

T19 

3.66 

41. 

T37 

6.32 

220. 

T94 

-  .943 

36.7 

1./T19 

61.4  E+03 

42.7 

1./T37 

0. 

0. 

1./T94 

0. 

0. 

T19.T37 

-  18.4  E-03 

135. 

T19.T94 

0. 

0. 

T37.T94 

0. 

0. 

T19/T37 

-995. 

202. 

T19/T94 

812. 

112. 

T37/T19 

-645. 

295. 

T37/T94 

-936. 

104. 

T94/T19 

0. 

0. 

T94/T37 

0. 

0. 

dependent  mean  13.66 
R2  .848 

intercept  estimate  442.5 
error  mean  -3.260  E-05 
error  standard  deviation  2.23 


prior 


5. 


6- 


TABLE  3.3,8 

REGRESSION  MODEL  FOR  CT  Cm x  TO2) 
(defined  with  863  data  points) 


variable 

coefficient 

F  value 

T19 

-  2.73 

59.7 

T37 

0. 

0. 

T94 

-  1.04 

79.5 

1./T19 

-  34.65  E+03 

34.7 

1./T37 

0. 

0. 

1./T94 

-  22.40  E+03 

7.5 

T19.T37 

65.6  E-04 

114.7 

T19.T94 

0. 

0. 

T37.T94 

0. 

0. 

T19/T37 

0. 

0. 

T19/T94 

0. 

0. 

T37/T19 

0. 

0. 

T37/T94 

0. 

0. 

T94/T19 

0. 

0. 

T94/T37 

141. 

81.3 

dependent  mean  7.75 
R2  .608 

intercept  estimate  622.23 
error  mean  2.278  E-04 
error  standard  deviation  4.57 

°prior 


7.29 


TABLE  3,3.9 


REGRESSION  MODEL  FOR  CH  (mx  102) 
(defined  with  863  data  points) 


variabl e 

coefficient 

F  value 

T19 

0. 

0. 

T37 

0. 

0. 

T94 

9.07 

171. 

1./T19 

0. 

0. 

1./T37 

0. 

0. 

1./T94 

0. 

0. 

T19.T37 

51.1  E-03 

269. 

T19.T94 

-  29.6  E-03 

36. 

T37.T94 

-  33.7  E-03 

388. 

T19/T37 

3271. 

13. 

T19/T94 

-6950. 

41. 

T37/T19 

-3124. 

40. 

T37/T94 

2334. 

8. 

T94/T19 

1733. 

17. 

T94/T37 

-3910. 

22. 

dependent  mean  33.19 
R2  .730 

intercept  estimate  5071. 
error  mean  -6.613  E-04 
error  standard  deviation  7.37 


TABLE  3.3.10 


REGRESSION  MODEL  FOR  WV  (gm/cm2) 
(defined  with  863  data  points) 


variable 

coefficient 

F  value 

T19 

0. 

0. 

T37 

0. 

0. 

T94 

0. 

0. 

1./T19 

-5508. 

122. 

1./T37 

8483. 

130. 

1./T94 

-4629. 

53. 

T19.T37 

-  75.5  E-06 

9. 

T19.T94 

0. 

0. 

T37.T94 

0. 

0. 

T19/T37 

0. 

0. 

T19/T94 

20.8 

25. 

T37/T19 

32.4 

98. 

T37/T94 

0. 

0. 

T94/T19 

0. 

0. 

T94/T37 

0. 

0. 

dependent  mean  1 .74 
R2  . 304 

intercept  estimate  -39.3 
•>' r -ir  mean  -7.97  E-07 


• iraard  deviation  . 327 


t o 
<v 
c 


uoissaj6a^  ueaui  [ 
-uon 


803  data  points  for  evaluation 
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Note  that  using  ar  =  1 0  minimizes  the  LWC  error.  We  remark  that  setting 

o  =  0  does  not  minimize  the  rms  error  since  the  measurements  are  nonlinear. 

Strictly  speaking,  the  use  of  ar =  0  with  noise-free  measurements  will  yield 

minimum  rms  errors  only  for  linear  measurements.  The  errors  for  the 

other  parameters  are  almost  identical  for  or  =  0.5°,  1°,  2°  and  for  the 

case  o^g  =  1°,  a37  =  2°,  a ^  =  3°.  The  effect  of  adding  actual  noise  to 

the  data  is  shown  in  Table  3.3.13,  where  a  represents  the  rms  noise,  in  °K, 

n2 

added  independently  to  each  of  the  three  channels.  The  LWC  and  WST  errors 

are  degraded  only  slightly  at  a  =1,  but  are  degraded  significantly  at 

n2 

a  =5.  The  errors  for  the  other  parameters  are  degraded  less  significantly. 
n2 

Table  3.3.14  shows  the  effect  of  noise  added  to  both  the  fitting  set 

(rms  value =  a  )  and  the  test  set  (rms  value =  a  ).  The  degradation  is 
nl  n2 

seen  to  increase  for  a  higher-order  regression  model,  a  general  characteristic 
of  the  regression  approach. 

Plots  of  the  estimation  error  for  several  selected  cases  for  each  of 
the  methods  studied  are  given  in  Figures  3.3.1  -3.3.19.  The  actual  esti¬ 
mation  errors  are  plotted  vs.  LWC  in  each  case.  Abbreviations  are  defined 
in  Table  3.3.2. 

3 . 4  Inversions  for  Data  Set  #2 
3.4.1  Data  Base 

Several  new  cases  were  added  to  the  data  base  in  order  to  include 
additional  parameters  of  interest.  The  10.69  GHz  channel  was  added  to  the 
microwave  sensor  complement  to  assess  its  capability  in  resolving  surface 
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_ - _ — 


EFFECT  OF  ACTUAL  MEASUREMENT  NOISE  ON  IEKF 
_  =  rms  measurement  noise  (°C),  803  points 


error  (g/cm 


Figure  3.3.1 
LWC  ERROR 


2 

water  vapor  =  1.704  gm/cm 
wind  speed  =22.5  m/sec 
A  temp  =  0°C 
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0.10 


0.06  0.09 
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Figure  3.3.7 


WST  ERROR 
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parameters.  In  addition,  a  series  of  rain  cases  and  ice  clouds  were  added. 

A  total  of  941  microwave  cases  were  run  using  detailed  simulations,  as 
described  in  Section  3.1.  A  summary  of  these  cases  is  given  in  Table  3.4.1. 

The  predictor  correlation  matrix  for  the  variables  to  be  estimated  is 
shown  in  Table  3.4.2  with  the  diagonal  elements  corresponding  to  rms  values. 

3.4.2  Forward  Models 

A  series  of  forward  models  have  been  generated  for  the  microwave 

channels  based  on  the  941  test  cases.  The  approach  used  was  nonlinear 

regression  using  the  following  seven  basic  predictors: 

2 

LWC  mi  Hi  gram/cm 
WST  m/sec 

DLT  °K 

CT  hundreds  of  meters 

CH  hundreds  of  meters 

2 

WV  gm/cm 

R  mm/hr  (rain  rate) 

In  addition,  the  following  variables  related  to  foam  cover  were  employed: 

WCC  =  -0.00188  WST  +  0.000263  WST2  -  0.00000  WST3  +  18.3  E  -  07  WST 
SCC  =  WCC(0. 255  WST  -  0.299) 

These  variables  are  empirical  fits  to  Cardone's  (1969)  white  cap 
model  and  Ross  and  Cardone's  (1974)  total  foam  model.  The  two  variables 
represent  the  percentage  of  the  ocean  covered  by  whitecaps  and  streaks 
respectively.  They  have  been  presented  as  two  separate  variables  since 
their  microwave  emission  characteristics  differ. 


MICROWAVE 


941 


3  mm/hr: 

12 

15  mm/hr: 

16 

RAIN 

30  rmi/hr: 

24  nm/hr: 

16 

22 

12  nm/hr: 

22 

24  nm/hr: 

22 

5000-6000  m:  4 

ICE 

5000-7000  m 

2 

5000-8000  m:  2 

2400-2900  m 

342 

4400-4900  m 

200 

2400-4900  m 

62 

330-660  m 

40 

660-1320  m 

40 

400-2900  m 

o 

C. 

LIQUID 

900-2900  m 

2 

1400-2900  m 

2 

2400-4400  m 

2 

2400-3900  m 

2 

1900-2900  m 

2 

1000-1500  m 

100 

2400-3400  m: 

2 

NO  CLOUD  m: 

25 

05- 


Table  3.4.2  Predictor  Correlation  Matrix 


LUC 

WST 

DLT 

CT 

CH 

WV 

R 

312.25 

0.019 

5.75 

0. 

0. 

2.21 

0.823 

0.027 

0. 

9.80 

0.157 

0.048 

0. 

0.319 

13.55 

0.657 

0.009 

0. 

0.650 

-0.026 

0.477 

0.861 

0.016 

0. 

0.719 

0.118 

0.599 

5.78 

[ilfiPMlPl 


standard  deviations 
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A  summary  of  the  modeling  errors  for  different  order  models  is  given  in 
Table  3.4.3  and  Fig.  3.4.1  and  the  most  significant  terms  are  given  in  Tables 
3.4.4  -  3.4.7.  In  these  tables,  predictors  are  listed  in  the  order  in  which 
they  were  entered  into  the  model  via  the  stepwise  regression  procedure.  The 
coefficients  and  the  F  ratios  are  given,  as  well  as  the  rms  error  after  7,  10 
and  15  predictors  have  been  entered.  Also  given  is  the  intercept  estimate  yg 
and  the  a  priori  rms  error  Og.  An  asterisk  indicates  a  variable  that  was 
deleted  in  the  final  model. 

Table  3.4.3  Forward  Model  Summary 


•  errors  in  °K 


_ I!? _ 

T37 

T94 

A  priori 

mean 

0 

■ 

m 

202,8 

30.0 

259.2 

13.5 

7^  Order  Model 

0 

1.00 

2.25 

6.02 

4.01 

10^*  Order  Model 

0 

0.75 

1.56 

3.66 

3.64 

15^*  Order  Model 

a 

0.61 

1.15 

2.87 

3.52 

Highest  Order  Model 

order 

0 

19 

0.59 

27 

0.94 

27 

2.05 

24 

2.63 
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Table  3.4.4  Forward  Model  for  T 


Predictor 


Coefficient 


34 . 3/ -3 


R-CT-CH 

LWC2-WST 

LWC-CH 

LWC2-CT 

LWC-CT 


83. 5/-3 
57 . 4/ -9 
10. 4/ -4 
21.1/-8 


lwc2-ch 


-17.7/-7 


LWC-WST 

LWC3-CT 


CT-CH 


LWC-CH2 

R-WST 

LWC/CT 

LWC2-CH2 


-44.5/-5 


-26.0/ -4 
99 . 6/ -3 
14.4/-9 
38. 7/-7 
-31.8/-4 
47 . 6/ -3 
74.3/-10 
94. 7/ -7 
-72 . 6/ -4 


.8 


Table  3.4.5  Forward  Model  for 


# 

Predictor 

Coefficient 

F 

- — - 

o 

1 

LWC 

.116 

57 

18 

see 

-18.6 

8 

7 

LWC2 

-23 . 4/ -5 

79 

22 

LWC-WST2 

-27 . 5/ -5 

11 

6 

W  V 

9.63 

554 

10 

LWC-CH 

20. 2/ -4 

155 

9* 

LWC-CT 

0 

2.25 

27 

R/CT 

21.6 

176 

20 

lwc2-ct 

21.1/-10 

272 

3 

DLT 

-23 . 2/-2 

270 

1.56 

16 

lwc2-ch 

-29. 9/-7 

456 

26 

R-CT-CH 

86. 6/-3 

302 

21 

lwc2*wst 

76. 9/ -8 

258 

17 

wee 

314 

30 

13 

CT-CH 

-72 . 7/ -4 

190 

1.15 

24 

CT/WV 

.146 

8 

14 

LWC-CH2 

17.2/-6 

41 

23 

lwc2-wv 

10.3/-5 

81 

29 

R-WST 

-22.0/-3 

114 

28 

R/WST 

-3.26 

98 

.94 

y0  =  115.9 


Oq  =  31.0 
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Table  3.4.6  Forward  Model  for  T^-, 


# 

Predictor 

Coefficient 

F 

11 

LWC/CT 

00 

rH 

1 

11 

4 

CT 

.445 

27 

19 

see 

-39.7 

8 

5 

LWC2 

29 . 4/ -8 

367 

10 

LWC-CH 

81.5/-4 

388 

16 

LWC2*  CH 

-25 . 9/ -6 

419 

20* 

LWC2*CT 

0 

13 

CT-CH 

-13.5/-J 

139 

17 

LWC2*CH2 

21.1/-8 

227 

22 

LWC-WST2 

-90.5/-5 

26 

8 

LWC-WST 

21.1/-3 

18 

7 

LWC2 

-14.7/-4 

513 

21 

LWC2*WST 

20. 2/ -7 

384 

6 

WV 

11.4 

142 

3 

DLT 

-.327 

114 

29* 

R-WST 

0 

23 

LWC2*WV 

49.0/-5 

218 

25 

LWC-WV2 

-29 . 9/-3 

149 

1 

LWC 

.412 

150 

y0  =  134.8 


Oq  =  30.0 
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3.4.3  Inversion  Results 
3.4. 3.1  Kalman  Filtering 

Several  studies  have  been  carried  out  to  evaluate  the  use  of  microwave 
data  for  estimating  cloud,  atmospheric  and  surface  parameters.  The  effect 
of  sequentially  processing  the  data  rather  than  batch  processing  in  the 
extended  Kalman  Filters  was  studied.  Since  the  lower  frequency  channel 
brightness  temperatures  are  more  linear  in  the  data  than  the  higher  frequencies 
the  sequential  processing  order  was  chosen  as:  T^g,  T^,  Tg^.  The  sequential 
processor  is  compared  with  the  batch  processor  in  Table  3.4.8.  Notice  that  the 
IEKF  sequential  processor  generally  does  a  little  worse.  This  suggests  that 
performance  improvements  may  be  rather  small  and  that  the  predominant  remaining 
error  sources  may  be  due  to  modeling  effects.  The  reduction  of  error  with 
sequential  processing  for  the  EKF  estimates  of  LWC  and  CH  can  be  attributed 
directly  to  the  fact  that  the  most  linear  measurements  are  processed  first. 

A  more  detailed  breakdown  of  the  sequential  processor  performance  is 
given  in  Table  3.4.9.  Results  are  shown  for  errors  after  processing  Tjg,  after 
processing  T^g  and  T^,  and  after  processing  T^g,  and  Tg4  in  sequence. 

Also  shown  in  the  table  are  the  Cramer-Rao  bounds  on  performance.  The 
"information  rms"  is  the  upper  bound  on  the  information  available  about  the 
variable  of  interest  from  the  measurements.  For  a  scalar  problem,  this 
quantity  is  the  inverse  of  the  lower  bound  on  the  rms  error.  The  quantity 
"Cramer-Rao  rms"  is  the  lower  bound  on  the  achievable  rms  error  for  any 
unbiased  estimator.  It  thus  represents  the  best  possible  accuracy  with 
which  a  variable  may  be  estimated  from  the  available  data.  Note  that  the 
error  for  DLT  is  near  its  bound  and  both  are  only  slightly  less  than  the 


Table  3.4.8  INVERSION  ERRORS  USING  KALMAN  FILTER 


Iterated  Extended  Kalman  Filter 
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Table  3.4.9  INVERSION  ERRORS  FOR  SEQUENTIAL  PROCESSING 
sequence:  T-jg,  T,,,  Tg^  (803  meas) 


- ~T  -  1 

LWC 

WST 

DLT 

CT 

CH 

WV 

a  priori  mean 

64.56 

13.64 

-.249 

6.456 

32.01 

1.729 

a  priori  rms 

46.92 

5.70 

1.918 

5.759 

13.99 

.385 

II 

information 

D 

rms 

.189 

2.364 

.643 

.248 

.208 

6.726 

Cramer-Rao 

cn 

c 

t/> 

to 

<D 

U 

rms 

12.63 

.512 

1.737 

4.862 

7.686 

.357 

error  mean 

-21.61 

-.483 

.020 

-1 .193 

-1.752 

-.009 

error  rms 

42.99 

4.765 

1.914 

5.516 

13.352 

.378 

II 

information 

rms 

2.537 

3.059 

.789 

.346 

.381 

7.442 

co 

Cramer-Rao 

h- 

C7> 

C 

to 

to 

<1> 

U 

o 

rms 

.409 

.400 

1.580 

3.565 

4.250 

.198 

error  mean 

5.70 

-2.200 

.317 

.170 

-4.613 

-.069 

s- 

error  rms 

16.43 

2.383 

1.972 

4.981 

14.260 

.380 

II 

s. 

information 

D 

rms 

6.385 

3.074 

1.079 

1.313 

.597 

8.522 

cn 

Cramer-Rao 

I— 

rms 

.166 

.395 

1.144 

1.270 

2.813 

.152 

cn 

c 

to 

o 

o 

error  mean 

.210 

-1.202 

.033 

-.394 

-3.374 

-.151 

s~ 

Q. 

error  rms 

16.38 

2.918 

1 .853 

5.091 

14.106 

.377 
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a  priori  value.  Thus,  DLT  is  not,  nor  can  it  be,  estimated  with  any 
confidence  from  the  data. 

It  is  useful  from  a  design  standpoint  to  investigate  the  Cramer-Rao 
bound  in  more  detail.  This  is  done  in  Table  3.4.10.  Shown  in  this  table 
are  the  bounds  as  a  function  of  assumed  rms  measurement  noise  and  modeling 
errors  (a^);  values  of  0.5°,  2°  and  5°  were  used,  assuming  equal  noise  rms 
values  for  each  channel.  The  bound  is  dependent  on  the  types  of  measurements 
used,  but  is  independent  of  whether  they  are  processed  in  batch  or  sequentially, 
or  in  which  order  they  are  processed.  Several  points  should  be  made  based 
on  this  data: 

(1)  Tg^  appears  to  be  the  best  predictor  for  LWC,  by  a  wide  margin. 

The  other  measurements  do  not  reduce  the  bound  significantly.  Note,  however, 
that  if  the  sensor  errors  are  lower  for  low  frequency  channels,  then  T.^  may 
be  better  than  Tg^  for  estimating  LWC. 

(2)  Assuming  that  ar  represents  the  effects  of  modeling  errors,  the 
importance  of  obtaining  accurate  forward  models  is  clearly  shown.  LWC  error 
for  or  =  5°  is  nine  times  that  for  ar  =  1°.  WST  error  bounds  are  also 
increased  dramatically. 

(3)  The  bounds  allow  the  designer  to  estimate  the  importance  of  a 
particular  measurement  directly.  For  example,  it  is  clear  that  one  cannot 
expect  to  achieve  any  significant  performance  improvements  over  the  a  priori 
errors  for  DLT  or  WV,  assuming  a  2°  rms  modeling  error.  In  addition,  if 
the  rms  modeling  error  is  .5°,  only  marginal  gains  in  estimating  CH  would 

be  possible.  However,  it  appears  that  large  improvements  relative  to  a 
priori  rms  errors  for  LWC  and  WST  estimates  are  possible. 

3. 4. 3. 2  Nonlinear  Regression 

A  series  of  inversions  has  been  made  on  the  new  data  base  of  941  cases. 

The  results  are  summarized  in  Table  3.4.11,  which  shows  the  rms  inversion  errors 
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Table  3.4.10  CRAMER -RAO  BOUNDS  FOR  INVERSION  ERRORS 
•  rms  errors  (863  meas) 


LWC 

WST 

0LT 

CT 

CH 

WV 

a  priori  rms 

99.3 

5.72 

1 .85 

7.29 

14.2 

- ! 

.39  i 

■■■ 

(■■■■■■ 

■■■■■■ 

T19 

9.5 

.26 

1.70 

4.84 

5.53 

.31 

T37 

.22 

.32 

1.54 

2.47 

3.72 

.31 

o 

LD 

T94 

.10 

3.70 

1.47 

.81 

2.06 

.22 

O 

II 

T19,T37 

.21 

.20 

1.46 

2.37 

2.72 

.12 

B 

T19’T94 

.10 

.26 

.80 

.77 

1 .82 

.09 

H 

T37’T94 

.08 

.31 

.78 

.71 

1.64 

.13 

■ 

T19,T37,T94 

.08 

.20 

.72 

.67 

1.50 

.08 

■ 

T19 

17.3 

1.00 

Ri 

4.90 

10.40 

.37 

B 

T37 

.82 

1.18 

4.46 

7.28 

.37 

H 

T94 

.37 

4.96 

m 

2.57 

6.52 

.33 

CM 

II 

T1 9  *  T37 

.81 

.76 

1.64 

4.42 

6.56 

.29 

B 

T19’T94 

.37 

.94 

1.51 

2.48 

6.00 

.25 

II 

T37’T94 

.33 

1.13 

1.48 

2.28 

5.16 

.30 

■ 

TI9,T37’T94 

.33 

.75 

1.45 

2.23 

4.92 

.24 

fl 

T19 

31 .2 

2.29 

19 

5.10 

12.43 

.38 

Si 

T37 

2.00 

2.49 

m 

4.87 

10.84 

.38 

1 

T94 

.89 

5.51 

1.71 

3.93 

10.33 

.36 

LT) 

II 

T19,T37 

1.99 

1.69 

1.75 

4.85 

10.33 

.36 

D 

T19’T94 

.88 

2.06 

1.68 

3.89 

9.83 

.34 

T37’T94 

.81 

2.43 

1.66 

3.77 

8.98 

.36 

T19,T37’T94 

.80 

1.68 

_ 

1.65 

_ 

3.73 

8.70 

.34 

nonlinear  linear 
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Table  3.4.11  REGRESSION  INVERSION  ERRORS 
std  dev  in  °K  (941  meas) 


LWC 

-Sj-  x  103 
cnr 

WST 

m 

DLT 

°K 

CT 

m  x  103 

CH 

m  x  10^ 

WV 

9 

R 

mm 

s 

2 

cm 

hr 

312.2 

5.75 

2.21 

9.80 

13.55 

0.477 

5.78 

111 

4.33 

2.17 

5.35 

10.88 

.27 

2.54 

3/4 

3/4 

4/4 

3/4 

4/4 

4/4 

4/4 

47.34 

1.75 

2.12 

4.9 

8.1 

.17 

1.98 

6/30 

7/30 

4/30 

5/30 

10/30 

7/30 

6/30 

45.52 

1.58 

4.75 

7.4 

.124 

8/30 

10/30 

8/30 

16/30 

13/30 

43.11 

1.35 

10/30 

15/30 
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for  both  linear  and  nonlinear  regressions.  The  notation  "a/b"  in  the  table 
indicates  that  "a"  out  of  a  total  of  "b"  possible  predictors  were  selected 
by  the  stepwise  regression  algorithm.  A  listing  of  the  total  set  of  30 
predictors  tried  for  inversion  is  given  in  Table  3.4.12,  along  with  the  mean 
and  rms  value  of  each  predictor. 

The  individual  regression  models  are  given  in  Tables  3.4.13  -  3.4.19,  along 
with  F  ratio  and  the  partial  correlations  before  entering  the  first  predictor. 
Also  shown  are  the  rms  fit  errors  at  the  points  corresponding  to  values  in 
Table  3.4.11. 


i 
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Table  3.4.12(b) 


Variable 

Mean 

a 

# 

T37/T10 

0.808 

0.0553 

21 

T37/T19 

0.633 

0.1110 

22 

T37/T94 

1.243 

0.0794 

23 

T94/T10 

1.617 

0.2164 

24 

T94/Tl 9 

0.781 

0.0979 

25 

T94/T37 

1.298 

0.1436 

26 

T10*T10 

17009 

5607 

27 

T19*T19 

27909 

12169 

28 

T37*T37 

42022 

12942 

29 

T94*T94 

67378 

6714 

30 

LWC 

165.6 

312.2 

WST 

13.69 

5.75 

DLT 

0 

2.21 

CT 

9.44 

9.80 

CH 

32.57 

13.55 

WV 

1.845 

0.477 

R 

1.701 

5.775 

Predictor 

Coeff 

^10 

-68.0/4 

T 1 9  *  T1 9 

83. 7/-3 

T37/T94 

-1436 

VT94 

-12.0/4 

T10/T94 

729 

T10/T19 

-0.16 

T10 

1.94 

T10*T37 

0.12 

T37 

-16.0 

o 

• 

o 

62. 7/ -3 

T19*T37 

-37 . 5/-3 

T19 

-18.9 

T1 9  *T37 

1810 

1/T19 

55.5/4 

T94/T10 

-35.7 

T94 

0.748 

T10/T19 

-6073 

T94/T37 

-179 

-0.495 

0.235 

-0.282 

-0.103 

0.366 

0.282 

0.388 

0.296 

0.247 

0.334 

0.240 

0.288 

-0.239 

-0.395 

-0.385 

0.098 

-0.003 

-0.306 


5 
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Table  3.4.15  INVERSION  FOR  DLT 


Predictor 

Coeff 

F 

corr 

a 

T94*T94 

0 

0.109 

T94 

0.32 

49 

0.103 

1/T37 

12.8/3 

39 

0.030 

T94/T10 

-3.35 

13 

0.061 

T94/T37 

-41.9 

28 

0.082 

2.12 

Table  3.4.17  INVERSION  FOR  CH 


Predictor 

Coeff 

F 

corr 

0 

T94/T37 

-1740 

64 

-0.397 

T37/T37 

0.147 

119 

0.232 

T10/T37 

-10.6/3 

202 

-0.07 

T94*T94 

16.8/-3 

15 

-0.242 

T10*T94 

51.7/-3 

93 

0.114 

T94/T19 

-8868 

145 

0.369 

T37T94 

-0.121 

99 

0.128 

T1 9  *T1 9 

40.6/-3 

20 

0.198 

T37/T94 

-4751 

95 

-0.014 

8.09 

T19'T37 

-0.16 

97 

0.217 

T10/T19 

7911 

173 

-0.159 

T19‘T94 

41.5/-3 

96 

0.131 

T37/Tl 9 

6266 

56 

0.285 

T19/T37 

2276 

70 

0.086 

T19/T10 

-1716 

14 

0.158 

T94 

-2.39 

4 

-0.229 

7.40 

yQ=  10,823 


o  = 13.55 
o 
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Predictor 

Coeff 

F 

corr 

T19/T10 

0 

0.702 

T37/T94 

-73.3 

430 

-0.485 

T94/Ti 9 

-89.0 

394 

0.567 

T19/T37 

57.2 

360 

0.028 

T10/T19 

-403 

754 

-0.701 

,/T19 

17.6/3 

123 

-0.655 

^10 

-15.7/3 

107 

-0.584 

T37/T10 

-75.5 

281 

0.509 

T94/T10 

73.3 

102 

-0.614 

T19/T94 

-64.9 

112 

-0.479 

T10/T94 

246 

323 

0.527 

T37/T19 

-129 

412 

0.650 

— 1 

VO 

• 

H 

vo 

-P* 

-31. 5/ -5 

39 

0.681 

T10’T10 

30.9/-5 

20 

0.621 

0.170 


0.124 


Table  3.4.19  INVERSION  FOR  R 


Coeff 

F 

corr 

14.6/-3 

126 

0.868 

-47.1/-1 

201 

0.334 

-16.4/-3 

27 

0.858 

-0.735 

-42.2/1 

137 

-0.593 

11.6/-3 

25 

0.836 

-37.2/3 

155 

-0.750 
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3.5  Inversion  Errors  for  Data  Set  #3 

3.5.1  Data  Base 

Data  Base  #2  was  modified  in  several  ways  to  produce  Data  Base  #3. 

The  number  of  thin  clouds  with  high  LWC  was  reduced  to  more  closely  match 
their  small  probability  of  occurrence.  The  frequency  of  high  rain  rates 
(>  12  mm/hr)  was  reduced  and  several  low  rain  rate  cases  (<  3  mm/hr)  were 
added.  Rain  layers  up  to  4000  meters  thick  were  added.  The  clear  column  cases 
were  removed  so  that  all  cases  were  run  over  clouds.  Since  ice  clouds  cannot 
be  detected  from  microwave  measurements,  these  cases  were  eliminated  f/om 
the  data  set,  leaving  only  water  clouds.  The  summary  of  cases  for  Data  Set  #3 
is  given  in  Table  3.5.1.  The  resulting  parameter  correlation  matrix  is  given 
in  Table  3.5.2. 

3.5.2  Forward  Models 

The  forward  models  determined  via  nonlinear  regression  are  given  in 
Table  3.5.3  for  all  four  microwave  channels  and  are  seen  to  be  highly  nonlinear. 
The  significance  of  the  predictors  can  be  determined  by  examining  Table  3.5.4, 
which  indicates  the  order  in  which  the  predictors  were  entered  into  the  models 
and  gives  the  F  ratios  for  the  final  models. 
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TABLE  3.5.1 

Summary  of  Data  Set  #3 
for  Microwave  Measurements 


Wind  Speed  (m/sec) 


i 

8 

10 

12.50 

15 

17.50 

20 

22.5 

# 

311 

20 

10 

20 

10 

281 

10 

Cloud  Thickness  (m) 


330 

5  00 

660 

1000-2350 

2500 

2600-3600 

3700-4200 

# 

20 

322 

20 

60 

132 

54 

54 

Cloud  Top  Height  (m) 


660 

1320 

1500 

2500 

2900 

3000 

3400 

4000 

4500 

4900 

# 

20 

20 

56 

14 

190 

40 

2 

66 

32 

222 

2 

Water  Vapor  (gm/m  ) 

.89-1. 

02 

1.38-1.62 

1.70-1.93 

2.01-2.48 

2 

.61-2.89 

Li 

40 

100 

•  280 

102 

140 

Rain 

Rate 

(mm/hr) 

0 

1 

2.4 

3 

5 

7 

12 

15 

24 

30 

# 

498 

12 

24 

12 

16 

22 

24 

16 

22 

16 
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TABLE  3.5.2 

Parameter  Correlation  Matrix 
for  Data  Set  #3 


LWC 

WST 

DLT 

CT 

CH 

WV 

R 

LWC 

427.0 

WST 

0.020 

5.83 

DLT 

0 

0 

2.09 

CT 

0.845 

0.028 

0 

12.20 

CH 

0.245 

0.041 

0 

0.436 

12.85 

WV 

0.735 

0.014 

^  0 

0.695 

0.031 

0.539 

R 

0.766 

0.014 

0 

0.615 

0.096 

0.619 

6.80 

mean 

291.4 

13.80 

0 

15.09 

35.01 

2.012 

2.83 

Note:  diagonal  elements  are  rms  values 


MICROWAVE  FORWARD  REGRESSION  MODELS 
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3.5.3  Inversion  Results 

Inversions  were  carried  out  using  nonlinear  regressions  with  a  single 
channel  and  noise-free  data.  The  results  are  shown  in  Table  3.5.5. 

Predictors  tried  included  powers,  inverses,  exponentials  and  logarithms 
of  the  basic  predictor.  Terms  employed  in  the  model,  at  the  5%  significance 
level,  are  indicated  by  x.  The  resulting  rms  errors  are  shown  at  the  bottom, 
along  with  the  apriori  values.  The  values  1,  2,  3,  4  denote  the  10.7,  19, 

37  and  94  GHz  channels,  respectively. 

Next,  a  series  of  linear  regressions  using  all  four  channels  were  run 
using  noise-free  data  on  both  fitting  and  testing  sets.  The  results  are 
shown  in  Tables  3.5.6  -  3.5.12.  It  can  be  seen  that  R2  for  LWC  and  CT 

is  reasonably  high,  WV  and  R  values  are  moderate  and  CH  and  WST  values  are 

2 

low,  R  for  DLT  is  less  than  2%,  indicating,  as  expected,  that  atmospheric 
temperature  variations  cannot  be  predicted  accurately  from  microwave 
measurements  over  water  clouds. 

A  series  of  nonlinear  regressions  were  run  to  assess  the  possibility 
of  improving  the  regression  inversions.  The  results  are  shown  in  Tables 
3.5.13  -  3.5.19.  In  each  table,  the  predictors  are  listed  in  order  of 
entry  into  the  model,  together  with  the  regression  coefficient  at  the 
final  step.  The  running  values  of  goodness-of-fit  (R  )  in  percent  and 
rms  fit  error  are  also  given.  An  asterisk  denotes  a  predictor  that  was 
subsequently  deleted  from  the  model  at  the  5%  significance  level, 

One  of  the  problems  encountered  in  regression  analysis  is  non¬ 
robustness  in  the  presence  of  noise.  High  order  regression  models  have 
a  tendency  to  overfit  the  data,  resulting  in  excessively  large  errors  in 


the  presence  of  noise  in  the  test  set  (i.e.,  in  practice).  An  indication 
of  this  problem  is  given  in  Tables  3.5.20  -  3.5.26  which  show  the  effect 
of  adding  noise  to  the  test  set  with  rms  values  of  1°  and  5°  over  all 
channels.  The  noise  on  each  channel  was  independent  of  the  noise  on  all 
other  channels.  The  data  show  the  actual  rms  errors  for  the  test  set  for 
increasing-order  models.  Note  that  for  low-order  (less  than  3  or  4)  models, 
there  is  generally  only  a  small  degradation  of  performance.  However, 
degradation  can  become  severe  at  high  orders  (generally  greater  than 
4  or  5).  A  comparison  of  noise  sensitivity  for  both  regression  and  Kalman 
Filtering  methods  is  given  in  Tables  3.5.27  -  3.5.32.  Two  values  of  rms 
measurement  noise  were  used  for  the  Kalman  Filters:  =  0.5°,  2°;  i.e., 

V  =  (0.5)^  or  (2)^  ,  with  1^  the  4x4  identity  matrix.  It  can  be 

seen  that  the  nonlinear  regression  method  is  much  more  sensitive  to  noise 
than  the  Kalman  Filters,  even  for  1°  rms  noise.  Note  also  that  the  LWC 
errors  for  the  Kalman  Filters  are  minimized  by  assuming  an  rms  measurement 
error  of  about  4°,  which  accounts  principally  for  modeling  error. 

Another  important  aspect  of  inversion  is  the  possiblity  of  data 
dropout,  which  could  be  caused  by  loss  of  communication,  loss  of  significance 
(bits),  intermittent  sensor  problems,  etc.  In  this  case,  something 
unexpected  has  happened;  however,  it  is  still  desirable  to  be  able  to 
process  the  remaining  data.  In  the  regression  approach,  the  inversion 
depends  on  all  of  the  data  being  present.  For  absent  data,  one  could 
simply  use  an  apriori  mean.  Of  course,  one  can  also  produce  regression 
models  for  all  possible  combinations;  however,  this  would  be  an  inefficient 
procedure  in  practice.  The  Kalman  Filter  method  takes  missing  data  into 
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account  in  a  very  natural  way.  The  effect  of  missing  data  (one  channel  out) 
on  regression  errors  is  shown  in  Tables  3.5.33  -  3.5.38  for  both  fitting 
to  the  missing  data  set  (model)  and  using  the  apriori  mean  (ave).  The  (ave) 
values  for  regression,  denoted  by  "reg",  are  compared  to  Kalman  Filter 
regression  in  Tables  3.5.39  -  3.5.40.  Both  mean  and  rms  errors  are  given 
and  show  that  the  Iterated  Extended  Kalman  Filter  is  the  most  tolerant  to 
missing  data. 

3.5.4  Cramer-Rao  Bounds 

The  preceding  results  have  been  predicated  on  several  specific  inversion 
technioues.  However,  none  of  these  can  be  said  to  be  truly  optimum.  For 
example,  there  are  many  existing  Bayesian  techniques  which  could  not  be 
studied,  due  to  the  limited  scope  of  this  study.  Rather  than  search  for 
the  "best"  overall  method,  it  is  helpful  to  first  consider  the  question 
of  achievable  performance,  given  the  available  sources  of  information. 

That  is,  we  wish  to  consider  the  extent  to  which  the  apriori  errors  can 
be  reduced  using  the  four  selected  microwave  channels.  A  technique  which 
can  be  used  for  this  purpose  is  the  Cramer-Rao  bound.  As  shown  in 
Appendix  B,  lower  bounds  on  the  rms  estimation  errors  for  each  parameter 
to  be  inverted  for  can  be  computed  from  knowledge  of  the  measurement 
function  h(*)  and  the  statistics  of  the  measurement  errors  (v).  A  summary 
of  the  achievable  rms  errors  is  given  in  Tables  3.5.41  and  3.5.42  for  all 
sensor  combinations.  The  sensor  rms  errors  were  a  =  0.5°,  1.0°,  2.0°, 
and  5.0°.  In  each  case  the  errors  for  each  channel  were  independent  of 
the  other  channel  errors,  but  the  rms  values  were  identical  for  all  channels. 
It  should  be  noted  that  the  bounds  are  independent  of  the  order  in  which 
the  data  are  processed. 
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Table  3.5.6 


INVERSION  FOR  INTEGRATED  LIQUID  WATER  VIA  LINEAR 
REGRESSION  USING  MICROWAVE  MEASUREMENTS 


Variable 
in  Order 
of  Entry 

Coefficient 
(Final ) 

R2 

(At  Step) 

a 

(At  Step) 

mean 

2.911  x  lo2  a 

0 

427.0 

T1 9 

1 .723  x  101 

.861 

159.1 

T37 

-7.312 

.898 

136.5 

T94 

1.505 

.898 

136.1 

T10 

-2.223 

.899 

135.6 

Intercept :  -1 . 298  *  1 0^ 


Table  3.5.7 

INVERSION  FOR  WIND  SPEED  VIA  LINEAR 
REGRESSION  USING  MICROWAVE  MEASUREMENTS 


Variable 
in  Order 
of  Entry 

Coefficient 
(Final ) 

R2 

(At  Step) 

a 

(At  Step) 

mean 

1.380xl0]  a 

0 

5.83 

T10 

4.965  x  10-1 

7.8 

5.59 

T1 9 

-3.820 x  10_1 

16.1 

5.34 

T37 

no 

VO 

00 

X 

o 

1 

21 .3 

5.17 

Intercept:  -1.388  x  10^ 


amean  value 


i 


INVERSION  FOR  DLT  VIA  LINEAR 
REGRESSION  USING  MICROWAVE  MEASUREMENTS 


662  meas.  over  water  clouds 


Variable 
in  Order 
of  Entry 

— 

Coefficient 
(Final ) 

R2 

(At  Step) 

a 

(At  Step) 

mean 

0a 

0 

2.09 

T94 

3.123  x 10‘2 

.009 

2.09 

T37 

-5.638  x 10“3 

.016 

2.08 

Intercept:  -6.847 


amean  value 


Table  3,5.9 

INVERSION  FOR  CLOUD  THICKNESS  VIA  LINEAR 
REGRESSION  USING  MICROWAVE  MEASUREMENTS 

662  meas.  over  water  clouds 


Variable 
in  Order 
of  Entry 

Coefficient 
(Final ) 

R2 

(At  Step) 

o 

(At  Step) 

mean 

1 . 509  x  101  a 

0 

12.20 

T37 

1 .638  x  10"1 

.770 

5.85 

T94 

-1 .623  x  10'1 

.794 

5.53 

T19 

2. 1 22  x  10'1 

.803 

5.41 

T10 

-1 .393  x  10'1 

.807 

5.36 

amean  value 


Intercept:  3.484 


V 
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Table  3.5.10 

INVERSION  FOR  CLOUD  TOP  HEIGHT  VIA  LINEAR 
REGRESSION  USING  MICROWAVE  MEASUREMENTS 

662  meas,  over  water  clouds 


Variable 
in  Order 
of  Entry 

Coefficient 
(Final ) 

R2 

(At  Step) 

a 

(At  Step) 

mean 

3.501  x  101  a 

0 

12.85 

T37 

5.908 x  10'1 

.131 

11.98 

T94 

-6.042 x  10'1 

.217 

11.38 

T1 9 

-6.492  x  10’1 

.265 

11.02 

T10 

4.939  x  10_1 

.299 

10.76 

2 

Intercept:  1.128x10^  amean  value 


Table  3.5.11 

INVERSION  FOR  ATMOSPHERIC  WATER  VAPOR  VIA  LINEAR 
REGRESSION  USING  MICROWAVE  MEASUREMENTS 

662  meas.  over  water  clouds 


Variable 
in  Order 
of  Entry 

Coeffic lent 
(Final ) 

R2 

(At  Step) 

0 

(At  Step) 

mean 

2.01  xio"1 

0 

0.539 

T1 9 

3.816  x  10"2 

.575 

0.352 

T10 

-2.633  x  10‘2 

.589 

0.346 

T37 

-1 .618  x  10'2 

.627 

0.329 

T94 

1 .343  x  10‘2 

.665 

0.312 

amean  value 


Intercept:  -1.183 
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Table  3.5.12 

INVERSION  FOR  RAIN  RATE  VIA  LINEAR 
REGRESSION  USING  MICROWAVE  MEASUREMENTS 
662  meas.  over  water  clouds 


Variable 
in  Order 
of  Entry 

Coefficient 
(Final ) 

R2 

(At  Step) 

a 

(At  Step) 

mean 

2.83a 

0 

6.98 

T19 

5.223  x  10'1 

54.7 

4.58 

T37 

-2.846  x  10_1 

64.1 

4.08 

T10 

-2.725  x  10"1 

68.5 

3.89 

T94 

1 .196  x  10"1 

69.2 

3.77 

Intercept:  -2.298x  10^  amean  value 


Table  3.5.13 
REGRESSION  FOR  LWC 


Variable 
in  Order 
of  Entry 

Coefficient 

(Final) 

R2 

(At  Step) 

0 

(At  Step) 

CONST 

2.914  x  102  a 

0 

427.0 

T1 9 

-4.163  x  10’2 

.896 

138.1 

1/T19 

6.063  x  105 

.929 

114.1 

exp(T]9) 

-3.953  x  10"117 

.937 

107.7 

T  *T  ^ 

‘lO  '37 

0 

.945 

100.5 

exp(T37) 

-1 . 528  x  1 0_1 18 

.948 

97.5 

T1 9 

3.821  x  101 

.950 

96.0 

T94/T10 

1.694  x  103 

.950 

95.5 

T10/T94 

6.861  x  103 

.955 

91.4 

T  -T 
'37  '94 

7.077  x  10-3 

.957 

89.4 

Intercept:  -1 .598  x  10 

a 

b 

mean  value 

deleted  in  later  step 

Table  3.5.14 


REGRESSION  FOR  WST 


Variable 
in  Order 
of  Entry 

Coefficient 

(Final) 

R2 

(At  Step) 

a 

(At  Step) 

CONST 

1 .380x  101  a 

0 

5.83 

^10 

-7.497  x  104 

.124 

5.46 

T?9 

8.611  x  10‘3 

.722 

3.08 

T1 0T37 

8.985  x  10"3 

.754 

2.90 

exp(T3?) 

6.642 x  10‘120 

.769 

2.81 

exp(T]g) 

-4.161  x  10"119 

.779 

2.75 

T37/T19 

-4.204  x  102 

.798 

2.63 

T10/T94 

-2. 166  x  102 

.828 

2.43 

1/T37b 

0 

.832 

2.40 

T94/T37 

-2. 188  x  102 

.841 

2.34 

T1 9 

-8.762 

.860 

2.19 

T1 0/Tl 9 

-7.164  x  102 

.868 

2.13 

T94/T19 

1 . 1 55  x  1 02 

.873 

2.09 

Intercept:  2.858  x  10 


amean  value 


deleted  in  later  step 
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Table  3.5.15 
REGRESSION  FOR  DLT 


Variable 
in  Order 
of  Entry 

Coefficient 
(Final ) 

R2 

(At  Step) 

o 

(At  Step) 

CONST 

0a 

0 

2.09 

exp(Tg4) 

2.380  x  10‘123 

.034 

2.06 

exp(T37) 

3.663  x  10“120 

.058 

2.04 

Intercept:  -7.821  x  10 


mean  value 
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Table  3.5.16 
REGRESSION  FOR  CT 


Variable 
in  Order 
of  Entry 

Coefficient 

(Final) 

R2 

(At  Step) 

a 

(At  Step) 

CONST 

1 . 509  x  1 01  a 

0 

12.20 

T37/T94 

7.329  x  101 

.795 

5.53 

T94/T37b 

0 

.808 

5.35 

exp(T]g) 

-2.329  x  10”^9 

.813 

5.28 

T?o 

-1.127  x 10"2 

.821 

5.17 

^10 

7.866  x  104 

.840 

4.90 

1/T19 

-3.293  x  104 

.844 

4.83 

T19/T10 

-1 . 230  x  1 02 

.847 

4.80 

.  Tio 

5.906 

.850 

4.76 

2 

Intercept:  -8.737  x  10 


mean  value 

deleted  in  later  step 
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Table  3.5.17 
REGRESSION  FOR  CH 


Variable 
in  Order 
of  Entry 

Coefficient 
(Final ) 

R2 

(At  Step) 

0 

(At  Step) 

CONST 

3.501  x  101  a 

0 

12.85 

T  /T  b 

1 94' 1 37 

0 

.189 

11.59 

t2 

37 

4.940 x  10"2 

.346 

10.41 

T94/T19 

-5.450 x  102 

.377 

10.17 

T37/T19 

-2.531  x 103 

.500 

9.11 

T19/T37 

-5.371  x  103 

.541 

8.74 

T?9 

1 .495  x  10'2 

.613 

8.03 

T19/T94 

-3.268  x  101 

.630 

7.86 

T1 9 " T37 

-6.058 x  10‘2 

.638 

7.78 

T94/T10 

2.492  x  102 

.654 

7.60 

T19/T10 

-2.572  x  102 

.660 

7.54 

T37T94 

-2.370  x  105 

.669 

7.45 

Intercept:  8.423  *  10“ 


mean  value 


deleted  in  later  step 


J 


Table  3.5.18 
REGRESSION  FOR  WV 


Variable 
in  Order 
of  Entry 

Coefficient 
(Final ) 

.  R2 

(At  Step) 

a 

(At  Step) 

CONST 

2 . 01 2a 

0 

.539 

T  T  b 

19'  94 

0 

.591 

.345 

1/T10 

4.837  *  103 

.632 

.328 

T37/Tl 9 

-1.469  *  101 

.705 

.294 

Vt19 

-3.900  *  103 

.740 

.276 

T1 9/Tl 0 

-1.728  *  101 

.824 

.227 

T37  T94 

3.214  *  10"4 

.834 

.221 

T94/T37 

-6.536 

.839 

.217 

T37/T94 

-1.941  *  101 

.852 

.208 

exp(T3?) 

-3.600  *  10-121 

.856 

.206 

T10'T94 

-5.796  *  10"4 

.857 

.205 

T10 

1.520  *  10_1 

.861 

.202 

Intercept:  3.480  *  10 


5 mean  value 
deleted  in  later  step 


Table  3.5.19 
REGRESSION  FOR  R 


Variable 
in  Order 
of  Entry 


CONST 


T  •  T 
'10  37 


exp(T37) 


T94/Tl 9 


T37/Tl 9 


T10,T19 


T19/T37 

T37/T94 

T94/T10 


Intercept:  -5. 1 20  *  10 


Coefficient 
(Final ) 

R2 

(At  Step) 

2.83a 

0 

5.777  x  10"3 

.590 

-5.828  x  10-4 

.703 

1 .058  x  10"119 

.730 

0 

.752 

6.223  x  10"3 

.756 

2.754  x  102 

.770 

-8.053 x  102 

.774 

-1 .129  x  10-2 

.778 

3.409  x  102 

.781 

-7.894  x  101 

.785 

-2.075  x  IQ1 

.789 

(At  Step) 


mean  value 

deleted  in  later  step 
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TABLE  3.5.20 

EFFECT  OF  NOISE  ON  LWC  REGRESSION 
RMS  Errors 

°model  ^ 


Order 

o 

II 

!  c 

D 

°n  =  1° 

°n-5° 

0 

427.0 

427.0 

427.0 

1 

138.0 

138.7 

147.1 

2 

114.0 

114.7 

123.6 

3 

107.5 

111.0 

182.4 

4 

100.2 

111.0 

342.9 

5 

95.6 

110.6 

6649 

6 

95.1 

109.7 

6393 

7 

89.1 

108.3 

6405 

8 

85.8 

119.2 

6924 

9 

84.5 

122.8 

4453 

10 

82.7 

140.4 

3386 

TABLE  3.5.23 

EFFECT  OF  NOISE  ON  CT  REGRESSION 
RMS  Errors 

°model  "  ® 


152- 


TABLE  3.5.24 

EFFECT  OF  NOISE  ON  CH  REGRESSION 
RMS  Errors 

°model  ~  ® 


i 
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TABLE  3.5.25 

EFFECT  OF  NOISE  ON  WV  REGRESSION 
RMS  Errors 

amode1  "  ^ 


an  =  0 

*n  =  1° 

an  =  5° 

.539 

.539 

.539 

.345 

.345 

.350 

.327 

.328 

.357 

.293 

.298 

.422 

.275 

.291 

.574 

.226 

.283 

.920 

.220 

.276 

.936 

.220 

.276 

.980 

.216 

.275 

9.36 

.207 

.278 

8.88 

.204 

.277 

8.13 
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TABLE  3.5.26 

EFFECT  OF  NOISE  ON  R  REGRESSION 
RMS  Errors 

amodel  ~  ^ 


Order 

- - - 

°n  =  0 

Q 

3 

II 

o 

°n  =  5° 

0 

6.80 

6.80 

6.80 

1 

4.35 

4.36 

4.41 

2 

3.70 

3.76 

4.71 

3 

3.53 

3.62 

298 

4 

3.38 

3.46 

286 

5 

3.26 

3.35 

313 

6 

3.23 

3.33 

314 

7 

3.20 

3.29 

271 

8 

3.18 

3.28 

278 

9 

3.12 

3.20 

236 
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TABLE  3.5.27 

INTEGRATED  LIQUID  WATER  INVERSION 
ERRORS  USING  MICROWAVE  MEASUREMENTS 
•  662  meas  over  water  clouds 


— 

Regression 

Kalman 

Filter 

linear 

nonlinear 

EKF 

IEKF 

a  priori 

427.0 

427.0 

427.0 

427.0 

(0) 

(0) 

(0) 

(0) 

o 

II 

c 

o 

135.6 

88.9 

292.1 

306.1 

(231 .1) 

(89.6)a 

m 

o 

0=1° 

n 

140.4 

291 .3 

316.5 

B 

(231.4) 

(100.6) 

I 

o=5° 

n 

3386 

291 .6 

349.0 

B 

(232.9) 

(113.8) 

■ 

O 

II 

c 

D 

135.6 

88.9 

287.3 

235.1 

I 

(225.4) 

(76.2) 

o 

II 

C 

o 

140.4 

286.6 

228.7 

1 

(225.7) 

(72.0) 

a  *  5° 
n 

3386 

286.8 

293.0 

■ 

(227.1) 

(92.0) 

o=4° 

r 

135.6 

88.9 

276.9 

182.8 

o=0 

n 

(214.3) 

(64.3) 

■ 

135.6 

88.9 

266.6 

189.9 

■ 

(204.2) 

(68.9) 

a  numbers  in  parentheses  are  mean  errors 
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TABLE  3.5.28 

WIND  SPEED  INVERSION  ERRORS  USING 
MICROWAVE  MEASUREMENTS 
•  662  meas  over  water  clouds 


Regression 

Kalman 

Filter 

linear 

nonlinear 

EKF 

IEKF 

a  priori 

5.83 

5.83 

5.83 

5.83 

(0) 

(0) 

(0) 

(0) 

on  =  0 

5.17 

2.02 

5.89 

6.92 

(-1.35) 

( -4 . 72 ) 3 

LO 

o 

a  =  1° 
n 

2.91 

5.88 

7.18 

H 

(-1.36) 

(-4.80) 

■ 

a  =  5° 
n 

132.5 

5.93 

8.02 

■ 

(-1.38) 

(-3.69) 

■ 

°n  =  0 

5.17 

2.02 

5.85 

6.67 

I 

(-1.24) 

(-3.76) 

a  =1° 
n 

2.91 

5.85 

6.75 

l 

-1.24 

(-3.80) 

a  =  5° 
n 

132.5 

5.89 

7.40 

■ 

(-1.27) 

(-3.76) 

a  =  4° 
r 

5.17 

2.02 

5.78 

6.91 

°n  =  0 

(-1.01) 

(-3.12) 

o=6° 

r 

5.17 

2.02 

5.72 

6.66 

c=o 

n 

(-0.79) 

(-2.71) 

_ 

numbers  in  parentheses  are  mean  errors 


TABLE  3.5.29 

CLOUD  THICKNESS  INVERSION  ERRORS  USING 
MICROWAVE  MEASUREMENTS 
•  662  meas  over  water  clouds 


Regression 

Kalman 

Filter 

linear 

nonlinear 

EKF 

IEKF 

12.20 

12.20 

12.20 

12.20 

(0) 

(0) 

(0) 

(0) 

5.36 

4.67 

7.21 

8.46 

(-3.50) 

(  - 1  •  1 3) a 

4.75 

7.27 

9.32 

(-3.49) 

(-1.15) 

7.30 

8.89 

12.65 

(-3.45) 

(-1.49) 

5.36 

4.67 

7.36 

7.70 

(-1.96) 

(-0.43) 

4.75 

7.38 

7.61 

(-1.95) 

(-0.47) 

7.30 

8.53 

9.52 

(-1.90) 

(-0.37) 

5.36 

4.67 

7.66 

6.98 

(0.28) 

(-0.24) 

5.36 

4.67 

7.82 

7.05 

0.47) 

(0.27) 

a  priori 


on  =  0 


°  I  °n  =  V 


o=5° 

n 


o=0 

n 


~  I  °n  =  V 


°n  =  5' 


o=6° 

r 

o=0 

n 


numbers  in  parentheses  are  mean  errors 
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TABLE  3.5.30 

CLOUD  TOP  HEIGHT  INVERSION  ERRORS 
USING  MICROWAVE  MEASUREMENTS 
•  662  meas  over  water  clouds 


Regression 

Kalman 

Filter 

linear 

nonlinear 

EKF 

IEKF 

a  priori 

12.85 

12.85 

12.85 

12.85 

(0) 

(0) 

(0) 

(0) 

°n  =  0 

10.76 

7.31 

15.10 

12.90 

(-4.40) 

( -3 . 93  a 

in 

o 

Os  1° 
n 

8.13 

15.12 

16.15 

(-4.40) 

(-3.76) 

I 

a  =  5° 
n 

21.77 

15.50 

25.4 

■ 

-4.37 

(-3.0) 

■ 

o 

II 

c 

D 

10.76 

7.31 

15.04 

11.00 

■ 

(-4.84) 

(-3.96) 

a  =  1° 

8.13 

15.06 

11.17 

1 

(-4.83) 

(-3.87) 

a  =  5° 
n 

21.77 

15.40 

15.27 

■ 

(-4.81  ) 

(-3.57) 

o=4° 

r 

10.76 

7.31 

14.90 

10.80 

°n  =  0 

(-5.41) 

(-3.30) 

■ 

BB 

10.76 

7.31 

14.71 

10.65 

■ 

HD 

(-5.54) 

(-2.69) 

numbers  in  parentheses  are  mean  errors 

k  A 


TABLE  3.5.31 

WATER  VAPOR  INVERSION  ERRORS 
USING  MICROWAVE  MEASUREMENTS 
•  662  meas  over  water  clouds 


Regression 

Kalman 

Filter 

linear 

nonlinear 

EKF 

IEKF 

.539 

(0) 

.539 

(0) 

.539 

(0) 

.539 

(0) 

.312 

.204 

.326 

.327 

(-.188) 

( -.001 )a 

.277 

.330 

.386 

(-.189) 

(-.001) 

8.13 

.419 

.678 

(-.188) 

(.023) 

.312 

.204 

.330 

.325 

(-.10) 

(.050) 

.277 

.333 

.327 

(-.097) 

(.049) 

8.13 

.392 

.440 

(-.098) 

(.058) 

.312 

.204 

.352 

.332 

(.038) 

(.047) 

.312 

.204 

.368 

.346 

(.114) 

(.053) 

a  priori 


°n  =  0 


o 

in 

d  on  -  T 


o„  =  50 


op  =  0 


~  I  =  10 


o=5° 

n 


°r  =  6 


numbers  in  parentheses  are  mean  errors 


TABLE  3.5.3? 

RAIN  RATE  INVERSION  ERRORS  USING 
MICROWAVE  MEASUREMENTS 
•  662  meas  over  water  clouds 


Regression 

Kalman 

Filter 

linear 

nonl inear 

EKF 

IEKF 

6.80 

6.80 

6.80 

6.80 

(0) 

(0) 

(0) 

(0) 

3.77 

3.12 

4.62 

4.80 

(-4.80) 

( 2 . 64 ) a 

3.20 

4.68 

5.03 

(-4.81) 

(2.50) 

236 

6.01 

6.44 

(-4.86) 

(0.94) 

3.77 

3.12 

4.27 

4.44 

(-3.44) 

(2.30) 

3.20 

4.32 

4.48 

(-3.45) 

(2.31) 

236 

5.31 

5.31 

(-3.50) 

(2.09) 

3.77 

3.12 

4.03 

4.28 

(-1.31) 

(1.94) 

3.77 

3.12 

4.13 

4.21 

(0.05) 

- - - 

(1.59) 

a  prion 


°n  =  0 


O  I 

d  on  .  r 


o=5° 

n 


°n-° 


~  °n  =  10 


a  =  5C 
n 


numbers  in  parentheses  are  mean  errors 


REGRESSIONS  FOR  WST  WITH  MISSING  DATA 


REGRESSIONS  FOR  CT  WITH  MISSING  DATA 


REGRESSIONS  FOR  CH  WITH  MISSING  DATA 


REGRESSIONS  FOR  WV  WITH  MISSING  DATA 


CO  CD 
i-  <D 
O  t- 
S«- 

S~  I 
LU  QJ 
CO 

00  *r- 

21  O 

Q1  C 


1 « 

CO 

CO 

0 

lo 

CO 

CO 

0 

CTi 

CO 

1  • 

CO 

CO 

CO 

OsJ 

■ 

I  LO 

LO 

LO 

LO 

00 

p— 

OO 

O 

CO 

1  • 

CO 

LO 

CT» 

LO 

1 

LO 

LO 

cr. 

I  Kf 

OJ 

r— 

r— 

0 

1  CO 

1 

CO 

CO 

co 

co 

■ 

1 

CT* 

0 

CO 

LO 

r— 

co 

LO 

LO 

LO 

CO 

LO 

10 

LO 

1  • 

OsJ 

CO 

CO 

OsJ 

LO 

LO 

CO 

CO 

CO 

CO 

CO 

CO 

co 

co 

LO 

LO 

r— 

co 

co 

00 

OsJ 

OsJ 

OsJ 

OsJ 

OsJ 

LO 

OsJ 

OsJ 

<T> 

0 

CO 

0 

O'. 

O 

*3- 

0 

CO 

LO 

LO 

LO 

LO 

— 

•- 

— 

— 

LO 

<T» 

cr* 

cr. 

CO 

OsJ 

OsJ 

OsJ 

co 

co 

CO 

r— 

co 

co 

CO 

CO 

LO 

co 

O'* 

CO 

CO 

CO 

CO 

co 

CO 

1  LO 

co 

<T> 

CO 

LO 

r>. 

CO 

CO 
!  # 

LO 

co 

co 

LO 

r-  CO 

co 


1 

10 

0 

cr* 

LO 

CO 

CO 

OsJ 

r— 

0 

1  CO 

1 

co 

CO 

CO 

CO 

668 


REGRESSIONS  FOR  R  WITH  MISSING  DATA 


out  217.1  -0.79  -3.75 


Measurements  over  Water  Clouds 
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TABLE  3.5.41 


CRAMER-RAO  BOUNDS  FOR  INVERSION  ERRORS 
•  rms  errors  (662  meas) 


a  priori 

LWC 

10 

20.3 

19 

11.7 

37 

3.48 

94 

0.59 

10,19 

8.79 

10,37 

2.02 

10,94 

0.59 

19,37 

1.79 

19,94 

0.59 

37,94 

0.58 

10,19,37 

1.66 

10,19,94 

0.58 

10,37,94 

0.56 

19,37,94 

0.56 

10,19,37,94 

0.55 

10 

37.5 

19 

20.3 

37 

6.31 

94 

1.18 

10,19 

16.4 

10,37 

3.71 

10,94 

1.17 

19,37 

3.56 

19,94 

1.17 

37,94 

1.14 

10,19,37 

3.29 

10,19,94 

1.16 

10,37,94 

1.11 

19,37,94 

1.11 

10,19,37,94 

1.10 

DLT 

CT 

CH 

W  V 

2.09 

0.57 

0.67 

0.23 

2.09 

2.33 

2.70 

0.19 

2.09 

0.70 

0.40 

0.24 

1.81 

0.78 

0.63 

0.31 

2.09 

0.55 

0.60 

0.12 

2.09 

0.35 

0.31 

0.19 

1.19 

0.37 

0.36 

0.21 

2.09 

0.44 

0.33 

0.12 

1.33 

0.45 

0.45 

0.13 

1.21 

0.41 

0.32 

0.20 

2.09 

0.32 

0.29 

0.09 

1.14 

0.34 

0.35 

0.09 

1 .06 

0.28 

0.26 

0.15 

1.11 

0.33 

0.28 

0.11 

1.04 

0.26 

0.24 

0.09 

2.09 

1.13 

m 

m 

2.09 

3.83 

2.09 

1.34 

0.29 

2.00 

1.35 

0.96 

0.32 

2.09 

1.09 

1.19 

0.20 

2.09 

0.69 

0.60 

0.26 

1.69 

0.71 

0.66 

0.28 

2.09 

0.88 

0.66 

0.20 

1.78 

0.89 

0.78 

0.21 

1.71 

0.79 

0.58 

0.27 

2.09 

0.64 

0.57 

0.17 

1.66 

0.66 

0.64 

0.17 

1.60 

0.55 

0.48 

0.23 

1.63 

0.64 

0.52 

0.19 

1.58 

0.51 

0.46 

0.16 

NOTE:  The  Cramer-Rao  bound  is  the  absolute  lower  bound  on  the  inversion  rms 
error  for  any  (linear  or  nonlinear)  processor. 
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IV.  INVERSIONS  FOR  CONTINUOUS-VALUED  PARAMETERS 
USING  VISIBLE,  NEAR  IR  AND  IR  MEASUREMENTS 

Cloud  identification  using  visible,  near  infrared  and  infrared 
data  is  performed  using  the  same  techniques  as  used  for  microwave  data. 

The  first  step  consists  of  forward  simulations  to  compute  intensities 
and/or  brightness  temperatures  as  a  function  of  cloud  and  atmospheric 
parameters.  The  simulations  are  performed  in  two  steps,  as  shown  in 
Figure  4.1.  The  program  RASP  is  used  to  compute  the  extinction  coeffi¬ 
cient  and  single  scattering  albedo  for  a  given  cloud  configuration. 

A  modified  version  of  the  LOWTRAN  3B  program  is  then  used  to  solve  the 
radiative  transfer  problem. 

The  basic  assumptions  and  conditions  used  in  the  simulations  are 
now  given.  A  midlatitude  Spring-Fall  standard  atmosphere  was  used. 
Cloud-free  visibility  was  23  km.  The  simulations  were  run  over  ocean 
for  two  different  data  sets,  with  the  surface  reflections  used  given  in 
Table  4.1.  The  sun  was  assumed  directly  overhead.  Additional  inputs 
required  are  the  values  of  solar  flux  and  the  index  of  refraction;  the 
values  of  solar  flux  are  given  in  Table  4.1  and  the  index  of  refraction 
of  water  at  the  selected  wavelengths  is  given  in  Table  4.2.  In  the  table, 
r  ,  C^ ,  C2  are  the  shape  parameters,  respectively  of  the  analytical  dis¬ 
tribution  of  Deirmendjian  (1964)  for  characterizing  the  cloud  layers. 


-Model  atmosphere 
-Wave  number 
-Cloud  characteristics 
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TABLE  4.1 

INPUT  DATA  USED  FOR  LOWTRAN  &  MULTSCA 


WAVELENGTH 

WAVE  NUMBER 

SOLAR  FLUX 

REFLECTION 
OF  SURFACE 

(microns) 

(cm-1) 

-1  -2  -1  -1 
(erg-sec  -cm  -strd  -cm  ) 

.55 

18180 

6.201  E+9 

.05 

.725 

1  3790 

4.520  E+9 

.025 

1.0 

10000 

2.324  E+9 

.02 

1.6 

6250 

6.366  E+8 

.02 

2.1 

4760 

2.706  E+8 

.02 

3.8 

2630 

3.406  E+7 

.02 

6.7 

1490 

0. 

0. 

10.5 

950  * 

0. 

0. 

11.5 

870 

0. 

0. 

12.5 

800 

0. 

0. 

zbase-ztop  (m) 


2400-2900 
2400-4900 
4400-4900 
330-  660 
660-1320 


Solar  zenith  angle:  overhead  sun 
Receiver  zenith  angle:  overhead 
Surface  temperature:  288. 6°K 
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TABLE  4.2 

INPUT  DATA  USED  FOR  RASP 


Density  =  .1,  .25,  .5,  .75,  1.,  1.25,  1.5,  1.75,  2. 
(g/m3) 


rc  =  10  microns,  C1  =6,  C2  =  0.5 
Composition  =  liquid 


WAVELENGTH 

(cm) 


INDEX  OF  REFRACTION  OF  WATER 


.55 

E-4 

1.333 

E+0,  -1.960 

E-9 

.725 

E-4 

1.331 

E+0,  -3.280 

E-8 

1.0 

E-4 

1.326 

E+0,  -3.040 

E-6 

1.6 

E-4 

1.316 

E+0,  -1.0 

E-4 

2.1 

E-4 

1.3 

E+0,  -8.0 

E-4 

3.8 

E-4 

1.366 

E+0,  -3.70 

E-4 

6.7 

E-4 

1.330 

E+0,  -3.70 

E-2 

10.5 

E-4 

1.179 

E+0,  -6.74 

E-2 

11.5 

E-4 

1.126 

E+0,  -  .142 

E+0 

12.5 

E-4 

1.123 

E+0,  -  .259 

E+0 

4. 1  Inversions  For  Visible/IR  Dataset 


A  series  of  forward  simulations  was  made  to  generate  models  for 
predicting  intensity  and  brightness  temperature.  The  atmospheric  con¬ 
ditions  used  in  these  simulations  were  derived  from  microwave  dataset  #3 
by  eliminating  cases  containing  variations  in  wind  speeo  and  rainfall 
rate,  since  these  variables  are  expected  to  have  little  effect  on  the 
observed  intensities.  A  summary  of  the  distribution  of  parameters  is 
given  in  Table  4.1.1.  The  values  of  the  simulated  intensity  for  the 
visible  and  near  IR  channels  (.55p-3.8y)  were  then  scaled  by  the  incident 
solar  flux  to  give  an  equivalent  albedo.  This  section  and  those  that 
follow  will  use  the  notation  A*x  for  albedo  at  wavelength  X  and  I*x  for 
intensity  at  wavelength  X. 

A  series  of  forward  models  for  predicting  the  observation  given  the 
meteorological  parameters  were  developed.  The  independent  variables  for 
these  regressions  were  nonlinear  functions  of  the  parameters  LWC,  CH 
and  CT.  Also  used  in  these  models  were  powers  of  the  cloud  optical  depth 
(OD).  The  later  parameter  is  basically  proportional  to  the  LWC.  The 
dependent  variable  was  either  albedo  or  observed  radiance. 

The  results  of  these  forward  models  are  given  in  Tables  4.1.2  through 
4.1.11.  In  the  regression  models  for  intensity  (x  =  6.7p  -  1 2 . 5u ) ,  the  units 
of  the  dependent  variables  are  107  erg-sec"1 -cnf2-str_1 -cm"1 . 

It  may  be  seen  that  the  models  in  the  visible  range  are  quite 
nonlinear  in  LWC  (or  its  equivalent  OD)  with  other  parameters  entering 
only  later  in  the  regression.  For  the  near-IR  channels,  the  forward 


models  begin  to  place  more  emphasis  on  CH  as  a  predictor.  In  the  models 
for  the  thermal  IR  channels,  the  intensity  is  a  strong  function  of  CH 
with  little  dependence  on  LWC. 


Inversion  Using  Linear  Regression 

A  series  of  noise-free  inversions  using  linear  regression  models 
was  performed.  The  results  are  shown  in  Tables  4.1.12-4.1.14,  in 
which  the  models  and  statistical  performance  are  shown  for  estimating 
LWC,  CT  and  CH.  Cloud  top  height  estimation  performance  was  the  best 
with  an  expected  strong  dependence  on  a  thermal  IR  channel.  Liquid  water 
content  and  cloud  thickness  were  estimated  less  well  with  the  3.8  u 
channel  being  the  best  linear  predictor.  It  should  be  noted  that  the 
strong  nonlinear  dependence  of  visible  channel  albedo  on  LWC  is  not 
exploited  by  the  linear  regression  scheme  of  this  experiment. 
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TABLE  4.1.1 
VISIBLE  AND  INFRARED 
DATA  SET 
190  CASES 


LWC  (gm/cm2 

x  103) 

6(6)60 

26(26)260 

4.3(4.3)43 

7.6(7.6)76 

286(26)520 

# 

no 

30 

10 

10 

30 

DLT  (°K) 

-5 

0 

5 

# 

20 

150 

20 

CT  (m) 


330 

500 

660 

2500 

# 

10 

no 

10 

60 

CH  (m) 


660 

1320 

1500 

2900 

4900 

# 

10 

10 

10 

50 

no 

WV  (gm/cm2) 


.89 

1.02 

1.38 

1.62 

1.70 

1 .82 

1.93 

2.29 

2.34  2.44 

# 

10 

10 

20 

30 

30 

20 

30 

10 

10  20 
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TABLE  4.1.2 

REGRESSION  MODEL  FOR  A 


Variable 

Coefficient 

F  Value 

LWC2  x  CH2 

4. 1 95  x  1 0_1 0 

7.14 

OD 

-7.120  x  10"3 

1199.41 

OD2 

9.866  x  10‘5 

627.38 

OD3 

-6.637  x  10'7 

607.18 

OD4 

2.004 x  10-9 

488.03 

OD5 

-2.260  x  10'12 

408. 66 

1  /OD 

-1 .882 

5413.62 

1  /OD4 

2.837  x  101 

710.00 

1  /OD5 

-4.570  101 

585.85 

R2  .999 


Intercept  Estimate  1.096 
Error  Standard  Deviation  2.717 x  10"3 
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TABLE  4.1.3 

REGRESSION  MODEL  FOR  A 


Variable 

Coefficient 

F  Value 

LWC2  x  CH 

-2.380 x  10"6 

65.86 

1/(LWC2xCH) 

-1.967 x  101 

86.88 

2  2 

LWC  x  CH4 

5.943  x  10'8 

77.39 

LWC3  x  CH2 

-6.224  x  10‘8 

84.80 

OD 

-6.806 x  10"3 

175.65 

OD3 

6.230  x  10"6 

83.40 

OD5 

7.064 x  10"13 

105.41 

1  /OD 

-1.753 

1579.66 

1  /OD4 

9.414 

199.61 

R2  . 987 


Intercept  Estimate  1.093 
Error  Standard  Deviation  9.724  x  10’3 


TABLE  4.1.4 


REGRESSION  MODEL  FOR  A] 


Variable 

Coefficient 

F  Value 

2  2 
LWC^ x  CFT 

5.838 x  10'9 

580.14 

LWC3 x  CH2 

-1.754  xlO'10 

253.88 

OD 

-3.778 x  10"3 

881.87 

OD3 

1 . 612  x  1 0"6 

242.53 

OD5 

3.297  x  10'13 

361.90 

1  /OD 

-1.349 

1983.33 

1/OD5 

5.203 

730.15 

R2  .998 

Intercept  Estimate  1.013 
Error  Standard  Deviation  3.540  x  10 


L.  i 


TABLE  4.1.5 


REGRESSION  MODEL  FOR  A,  , 

I  .  0 


Variable 

Coefficient 

F  Value 

WV 

-2.721  x  10"4 

5.538 

1/(LWC2xCH) 

5.369 

969.36 

?  ? 
Lwr  X  CFT 

-2.972  x  10'9 

1083.49 

LWC3 x  CH2 

5.653  x  10'11 

1174.10 

OD 

7.900  x  10"3 

1533.67 

2 

od£ 

-9.272 x  10"5 

1290.79 

o 

o 

-2.777 x  10"9 

1086.12 

OD5 

3.499  x  10"12 

1021 .68 

1  /OD 

3.577 

2760.95 

1  /OD2 

-2.275  x  101 

3650.22 

1/0D3 

5.082  x  101 

3218.55 

1/OD4 

-3.982 x 101 

2844.19 

R2  .999 

Intercept  Estimate  4.980  x  10"^ 

-4 

Error  Standard  Deviation  6.411x10 


r 


TABLE  4.1,6 

REGRESSION  MODEL  FOR  ] 


Variable 

Coefficient 

F  Value 

CH 

-2.044  x  10'5 

2287.19 

LWC  x  CT 

-2.698 x  10"6 

9422.25 

LWC  x  CH 

4.113  x  10"7 

918.70 

LWC2  x  CH 

5.425 x  10"8 

7621.10 

OD2 

-2.372  x  10‘6 

7646.43 

OD3 

-3.523 x  10'6 

2319.69 

1  /OD 

-2.046  x  10'1 

16163.84 

1/OD2 

2.061 

25235.01 

1  /OD3 

-5.995 

24528.71 

1/OD4 

-1 .255  x  101 

19228.44 

1/OD5 

3.609  x  101 

67999.58 

R2  .999 

Intercept  Estimate  4.858  xio”1 
Error  Standard  Deviation  3.929 xio-5 
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TABLE  4.1.7 


REGRESSION  MODEL  FOR  A 


Variable 

Coefficient 

F  Value 

CH 

-6.234 x  10'4 

403.15 

LWC  x  CH 

-1 .685  x  10-5 

635.04 

LWC2xCH 

2.786  x  10”7 

403.49 

OD2 

-2.095  x  10‘5 

225.33 

OD3 

oo 

o 

X 

CO 

00 

1 

51 .04 

OD5 

^4 

X 

O 

1 

-P* 

13.03 

Intercept  Estimate  3.284  x  10 


Error  Standard  Deviation  3.377  x  10 
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TABLE  4.1.8 

REGRESSION  MODEL  FOR  Ig  ? 


R2  .923 

Intercept  Estimate  3.738 
Error  Standard  Deviation  1.705  x  10 


TABLE  4.1.9 


REGRESSION  MODEL  FOR  I]0  g 


Variable 

Coefficient 

F  Value 

-2 

CH 

-5.830  x  10 

2731.59 

1/OD 

1 .338 

34.78 

R2  .945 

Intercept  Estimate  6.641 
Error  Standard  Deviation  2.101  x  io 


TABLE  4.1.11 


REGRESSION  MODEL  FOR  I]2  g 

Variable  Coefficient  F  Value 

CH  -4.957  xlO"2  2719.44 

l./OD  1.032  34.90 

R2  .945 

Intercept  Estimate  6.357 
Error  Standard  Deviation  1.790  xl0_1 
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TABLE  4.1.13 


REGRESSION  MODEL  FOR  CT 
(CT  in  m  x  100) 


Variable 

Coefficient 

F  Value 

A .  725 

-4.820 x 101 

H.80 

A1 . 6 

6.650 x  101 

8.01 

A3.8 

-3.367  x  102 

504.18 

r6.7 

-7.926  x  lo'7 

4.00 

1 1 2 . 5 

1 .034  x  IQ"6 

9.60 

R2  .819 

Intercept  Estimate  6.637 x  101 
Error  Standard  Deviation  4.029 
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TABLE  4.1.14 
REGRESSION  MODEL  FOR  CH 


(CH  in  m  x  1 00) 


Variable 

Coefficient 

F  Value 

A1.0 

-1.581  x lo1 

22.73 

-6 

1 1 0 . 5 

-1.592  x  10 

2913.11 

R2  .942 

2 

Intercept  Estimate  1.231x10 
Error  Standard  Deviation  3.471 
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V.  CLASSIFICATION  OF  DISCRETE  VALUED 
PARAMETERS  USING  VISIBLE  AND  I.R.  DATA. 

DISCRIMINATION  OF  ICE  CLOUDS,  WATER  CLOUDS, 

CLEAP  COLUMN  WITH  SNOW,  CLEAR  COLUMN  WITH  LOAM 

5.1  Introduction 

The  nature  of  the  problem  addressed  by  this  section  is  the  one  of 
automatic  scene  classification  given  a  set  of  measurements  in  a  spectral 
interval.  Specifically,  the  problem  at  hand  is  to  classify  the  scene 
into  several  categories:  (1)  a  glaciated  cloud  or  ice  cloud,  (2)  a  water 
cloud,  (3)  a  clear  column  with  snow  or  (4)  a  clear  column  with  bare  ground. 

The  spectral  interval  of  interest  for  the  purposes  of  this  classifi¬ 
cation  is  the  specified  wavelengths  in  the  band  from  . 5!>  p  to  12.6  y.  The 
wavelengths  in  the  microwave  are  excluded  from  consideration.  Examination 
of  brightness  temperatures  as  a  function  of  particle  size,  snow  depth,  etc. 
compiled  for  a  model  proposed  by  Chang  et  al .  (1976)  show  that  a  meter  of 
new  snow  is  virtually  indistinguishable  from  a  bare  surface,  while  10  cm 
of  snow  subjected  to  repeated  freeze  thaw  cycles  responds  quite  noticeably. 
In  general,  Chang  et  al . ,  state  that  the  penetration  depth  of  most  snows 
range  from  10  to  100  wavelengths.  Due  to  this  variability,  the  microwave 
channels  will  be  excluded. 

The  approach  that  will  be  taken  in  the  scene  classification  is  the  non- 
parametric  approach  of  Friedman  (1977).  This  approach  has  several  desirable 
properties: 

(i)  The  procedure  is  computationally  simple  for  both  training  and 
classification  phases.  This  is  of  great  importance  in  the  present  problem, 
due  to  its  high  dimensionality. 
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(ii)  The  decision  rule  that  results  from  the  training  phase  is 
asymptotically  Bayes1  risk  efficient  as  the  number  of  training  samples 
increases  without  limit. 

(iii)  Additional  features  may  be  easily  added  prior  to  the  training 
phase.  These  may  be  linear  or  nonlinear.  Noninformative  features  are 
simply  ignored  in  the  training  phase.  However,  any  of  the  additional  fea¬ 
tures  which  are  informative  are  used,  and  become  part  of  the  classification 
algorithm. 

The  discrimination  experiment  performed  using  the  Friedman  Tree 
approach  entailed  the  discrimination  of  four  scene  categories:  clear 
columns  over  snow,  clear  columns  over  loam,  ice  clouds  over  either  back¬ 
ground  and  water  clouds  over  either  background.  The  results  of  the  experi¬ 
ment  indicate  that  there  exist  robust  features  to  discriminate  clear 
columns  and  water  clouds.  Ice  clouds  are  also  usually  discriminated,  but 
the  partitioning  trees  are  more  complex. 

5 . 2  The  D iscrimination  Experiment  Data  Base 

The  experimental  data  base  used  in  both  discrimination  experiments 
was  constructed  using  simulations  from  the  modified  LOUTRAN  3B  code  provided 
by  Environmental  Research  and  Technology  Inc.  (ERT).  As  presently  imple¬ 
mented,  this  code  provides  the  ability  to  compute  the  observed  radiance  or 
brightness  temperature  from  a  user  specified  atmosphere  over  a  surface  with 
a  user  specified  reflectivity.  A  single  cloud  layer  with  a  user  specified 
absorption  and  sinile  scattering  albedo  may  be  inserted  in  atmosphere.  These 
two  cloud  quantities  were  computed  for  a  cloud  phase  and  drop  size  distri¬ 
bution  using  the  E  :T  program  RASP. 
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The  data  base  was  constructed  from  cloud  thickness,  phase  and  alti¬ 
tudes  in  a  manner  expected  to  span  the  single  layer  cloud  types  that  might 
be  encountered  in  mid-latitude  and  sub-arctic  winter.  Due  to  current 
limitations  in  the  simulation  software,  multi-layer  or  multi-phase  cloud 
types  could  not  be  simulated. 

All  cloud  types  were  simulated  over  both  a  snow  surface  and  a  loain 
surface.  The  liquid  phase  clouds  were  simulated  using  the  45°  winter 
supplemental  atmcsphere  internal  to  LOWTRAN.  The  ice  phase  clouds  were 
simulated  using  both  the  LOWTRAN  60°  winter  and  45°  winter  supplemental 
atmospheres. 

In  the  simulations  that  used  the  45°  latitude  atmosphere,  the  entire 
simulation  was  run  using  a  surface  temperature  of  265°  K  and  again  using 
a  surface  temperature  of  272.15°  K.  For  the  60°  winter  atmosphere,  the 
surface  temperatures  used  were  260°  K  and  255°  K.  The  exception  to  this 
surface  temperature  practice  were  the  simulated  clear  columns.  For  these 
cases,  the  surface  temperature  was  varied  from  255°  K  to  260°  K  in  0.5°  K 
increments  for  the  60c  winter  atmosphere  and  265°  K  to  271.5°  K  and  272.15°K 
in  0.5°  K  increments  for  the  45°  winter  atmosphere. 

A  summary  of  the  reflectivities  used  for  the  loam  and  snow  surfaces 
is  given  in  Table  5-1.  The  figures  for  snow  were  abstracted  from  Valley 
(1966)  and  O'Brien  and  Munis  (1974). 

The  reflectivity  for  3.8  pm  was  extrapolated.  The  reflectivity  values 
for  loam  were  taken  or  extrapolated  from  Valley  (1965).  The  values  for  the 
reflectivity  at  wavelengths  greater  than  4  ym  was  taken  to  be  zero  for  both 
surface  types. 
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TABLE  5-1 

ASSUMED  SURFACE  REFLECTIVITES 


WAVELENGTH 

(MICRONS) 

SNOW 

REFLECTIVITY 

LOAM 

REFLECTIVITY 

.5501 

0.737 

0.0875 

.7252 

0.775 

0.1875 

1,0000 

0.775 

0.3000 

1.6000 

0.070 

0.3000 

2.1008 

0.050 

0.3000 

3.8023 

0.050 

0.3000 

6.7114 

0.0 

0.0 

10.5263 

0.0 

0.0 

11.4943 

0.0 

0.0 

12.5000 

0.0 

0.0 
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The  values  of  the  solar  flux  were  taken  to  be  those  given  earlier 
in  this  report.  It  should  be  noted  that  these  values  will  not  be  exactly 
correct  for  these  simulations  due  to  the  difference  in  the  inclination 
of  the  earth's  axis  between  spring  and  winter  and  45°  and  60°  latitude. 
For  those  cases  in  which  the  discriminating  features  are  ratios  of 
observed  radiances  in  either  the  visible  or  near  IR,  this  use  of  45° 
spring-fall  flux  will  be  transparent.  For  other  features  it  will  be  a 
simple  shift  in  the  cut  value  due  to  the  monotone  properties  of  the 
Friedman  Tree  algorithm. 

A  summary  of  the  cloud  altitudes  for  the  liquid  phase  clouds  simu¬ 
lated  is  given  in  Table  5-2.  The  liquid  water  densities  for  these  clouds 

3  3  3  3 

were  simulated  at  .25  g/m  ,  .5  g/m  ,  .75  g/m  and  1  g/m  for  ell  clouds 
3  3  3  3 

and  1.25  g/m  ,  1.5  g/m  ,  1.75  g/m  ,  and  2.0  g/m  for  those  clcuds  thicker 
than  1  km. 

A  summary  of  the  cloud  altitudes  for  the  ice  phase  cloud-  is  given 

3  3 

in  Table  5-3.  These  clouds  were  simulated  at  .025  g/m  ,  .05  g/m  , 

.1  g/m3  and  .2  g/m3. 

The  computed  absorption  and  single  scattering  albedo  for  a  liquid 

3  3 

cloud  with  a  density  of  1  g/m  and  an  ice  cloud  with  a  density  of  .1  g/m 

are  given  in  Tables  5-4  and  5-5. 

The  total  number  of  simulations  run  was  599.  Of  these  74  were  clear 
columns,  188  were  water  clouds  and  337  were  ice  clouds. 


j 
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TABLE  5-2 

WATER  CLOUD  ALTITUDES 
USED  IN  THE  SIMULATION 


BASE 

(METERS) 

TOP 

(METERS) 

500 

1000 

1000 

1500 

1500 

2000 

2000 

2500 

2500 

3000 

3000 

3500 

3500 

4000 

1000 

2000 

2000 

3000 

2500 

3500 

3000 

4000 

1000 

3000 

1500 

3500 

2000 

4000 
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TABLE  5-3 

ICE  CLOUD  ALTITUDES 
USED  IN  THE  SIMULATION 


BASE 

TOP 

(METERS) 

(METERS) 

6000 

6200 

4000 

4500 

6000 

6500 

7000 

7500 

8000 

8500 

2000 

3000 

3000 

4000 

4000 

5000 

5000 

6000 

6000 

7000 

7000 

8000 

8000 

9000 

3000 

5000 

k 
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TABLE  5-4 


WATER  CLOUD  CHARACTERISTICS 
USED  IN  THE  SIMULATION 
FOR  A  DENSITY  OF  1  g/m3 


WAVELENGTH 

(MICRONS) 

EXTINCTION 

NEPERS/CM 

SINGLE 

SCATTERING 

ALBEDO 

.55 

6.160 

X 

10"4 

1.00000 

.72 

6.164 

X 

10"4 

1.00000 

1.00 

6.252 

X 

10'4 

.99972 

1.60 

6.468 

X 

10"4 

.99159 

2.10 

7.172 

X 

10"4 

.94251 

3.8 

7.108 

X 

10"4 

.82479 

6.7 

7.200 

X 

10"4 

.55901 

10.0 

7.408 

X 

1C"4 

.56201 

11.5 

6.460 

X 

10"4 

.46718 

12.5 

6.592 

X 

10"4 

.46395 
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TABLE  5-5 

ICE  CLOUD  CHARACTERISTICS 
USED  IN  THE  SIMULATIONS 
FOR  A  DENSITY  OF  .1  g/m3 


WAVELENGTH 

(MICRONS) 

EXTINCTION 

NEPERS/CM 

SINGLE 

SCATTERING 

ALBEDO 

.55 

1.540 

X 

-5 

10 

1.00000 

.72 

1.541 

X 

10"5 

1.00000 

1.00 

1.563 

X 

IQ'5 

0.99972 

1.60 

1.612 

X 

IQ'5 

0.99139 

2.10 

1.807 

X 

io-5 

0.93875 

3.8 

1.737 

X 

10'5 

0.77424 

6.7 

1 .668 

X 

10  0 

0.50000 

10.0 

1.668 

X 

IO'5 

0.52100 

11.5 

1.682 

X 

10"5 

0.50332 

12.5 

1.651 

X 

10'5 

0.50228 
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5.3  Partitioning  Results 

Using  the  simulations  of  the  database,  a  single  major  experiment  was 
performed.  The  classes  to  be  discriminated  were  clear  columns  over  loam,  ' 
clear  columns  over  snow,  liquid  clouds  and  ice  clouds.  The  purpose  of  this 
experiment  was  twofold.  First,  it  was  performed  in  support  of  the  inver¬ 
sion  experiments  in  earlier  sections  of  this  report.  These  experiments 
rely  on  a  clear  column,  water  cloud,  ice  cloud  discrimination  for  their 
choice  of  algorithm.  The  second  purpose  of  this  experiment  was  strictly 
meteorological  as  the  presence  or  absence  of  clouds  and  their  phase  are 
important  parameters  in  their  own  right.  In  the  experiment,  the  sensitivity 
of  the  partitioning  to  the  simulation  accuracy  was  explored  by  repeating 
the  experiments  with  0.1,  0.5,  1.0,  5.0  and  10.0  percent  multiplicative 
pseudorandom  noise  from  a  normal  distribution  on  the  data. 

The  features  used  in  the  discrimination  experiments  were  the  observed 

radiances  at  the  visible  and  near  IR  wavelengths,  brightness  temperatures 

at  the  thermal  wavelengths  and  the  ratios  between  these  variables.  As  before 

-1  -2  -1  -1 

the  radiance  used  in  the  experiment  was  erg-sec  -cm  -strd  -cm  .  The  mono¬ 
tone  properties  of  the  Friedman  Tree  algorithm  guarantee  that  this  choice 
of  feature  space  will  be  fairly  extensive.  For  the  noiseless  case,  the  use 
of  any  other  monotone  function  of  the  observed  radiances  such  as  a  Albedo 
would  yield  equivalent  partitioning  rules. 

The  partitioning  trees  for  the  experiment  using  1.0  percent  noise  are 
given  in  Figs.  5.1  -5.4.  The  trees  for  1  percent  noise  rather  than  other 
values  are  given  for  the  results  of  this  experiment  for  reasons  of  robustness. 
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Figure  5.3.  Decision  Tree  for  Clear  Columns  Over  Loam 
1.0  %  Multiplicative  Noise 


Fig.  5.4.  Decision  Tree  For  Clear  Columns  Over  Snow 
1.0%  Multiplicative  Noise 
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As  discussed  in  the  following  section,  the  features  selected  represent 
those  with  real  discrimination  ability,  rather  than  those  whose  discrimi¬ 
nation  ability  comes  from  small  differences  in  boundary  assumptions,  etc. 

A  number  of  quantities  are  depicted  in  these  figures.  The  starting  node 
of  the  tree  is  denoted  by  an  S  and  the  terminal  node  of  the  tree  by  a  T. 

The  quantity  in  the  box  is  the  partitioning  feature  with  its  cut  value. 

For  the  first  experiment,  the  bracketed  quantities  below  a  node  are  the 
number  of  clear  columns  over  loam,  clear  columns  over  snow,  water  clouds 
and  ice  clouds  respectively  that  descended  to  that  node.  The  probability 
P.j  given  at  a  node  is  the  conditional  probability  that  an  observation  of 
the  class  to  be  discriminated  will  appear  at  the  node  given  that  the  tree 
was  entered  with  a  member  of  that  class.  The  probability  is  the  condi¬ 
tional  probability  of  the  other  classes  reaching  that  node. 

Tables  5-6  through  5-29  give  the  decision  trees  for  the  first  experi¬ 
ment  in  tabular  form.  These  are  given  for  noises  of  0,  0.1,  0.5,  1.0,  5.0 
and  10.0  percent. 

5.4  Discussion  of  the  4  Classes  Discrimination 

The  decision  trees  for  almost  all  cases  are  quite  simple,  consisting 
usually  of  only  two  or  three  levels.  The  tree  for  water  clouds  are  uni¬ 
versally  the  shortest.  For  no  noise,  the  feature  used  for  discrimination 
is  T,„  C/T,0  c-  For*  noise  values  from  0.1%  through  1.0%,  the  choice 
shifts  to  1^  g/T] 2  g.  At  5 %  and  10%  noise,  the  feature  used  is  1^  &  alone. 
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The  T10  5^12  5  feature  Probably  an  artifact  resulting  from  an  assump¬ 
tion  of  100%  emissivity  of  the  surface.  The  feature  selected  results 
from  water  clouds  being  both  relatively  bright  at  1.6  u  and  cold  in  the 
thermal  region,  thus  yielding  a  high  ratio  and  high  I-j  ^  value  alone. 

The  decision  trees  for  ice  clouds  are  somewhat  more  complicated. 

The  primary  feature  used  is  the  ratio  of  the  two  near-IR  channels  1.6  y 
and  2.1  y.  The  few  cases  not  discriminated  by  this  test,  are  resolved  by 
tests  using  ratios  of  a  visible  or  near-IR  channel  to  a  thermal  one.  The 
primary  feature  results  from  the  fact  that  the  cloud  model  used  calculated 
a  lower  reflectance  for  ice  clouds  at  1.6  y  relative  to  their  2.1  y 
reflectance  than  was  calculated  for  water  clouds  or  assumed  for  loam  and 
snow. 

The  primary  discriminant  for  clear  columns  over  loam  is  the  ratio  of 
1-j  q  to  a  thermal  channel  (usually  T^  5)-  At  noises  at  or  above  5.0%, 
this  changes  to  the  value  of  a  visual  channel  alone.  This  comes  from  the 
tact  that  the  loam  surface  is  both  visually  dark  and  thermally  warm. 

The  final  case  of  clear  columns  over  snow  also  uses  a  1^  I 2  -j 

ratio  for  a  low  noise  discriminating  feature.  The  lack  of  robustness 
of  this  feature  is  evidenced  in  the  cases  of  increased  noise  by  a  switch 
to  the  ratio  of  I  or  1^  Q  to  This  feature  is  quite  robust 

through  the  10%  noise  case  and  discriminates  well.  It  is  due  to  the 
sudden  drop  in  the  surface  reflectivity  of  snow  at  about  1.6  y.  This 
feature  has  been  noted  previously  by  Valovcin  (1976)  and  Hunt  et  al .  (1974). 
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VI.  INVERSIONS  USING  COMBINED  MICROWAVE, 

VISIBLE  AND  INFRARED  DATA 

6.1  Inversion  for  Continuous-Valued  Parameters 

A  series  of  inversions  were  tried  using  the  combination  of  microwave, 
visible  and  infrared  channels  studied  in  the  preceding  chapters.  The 
data  set  used  was  the  one  employed  for  analyzing  visible  and  infrared 
data  (cf..  Table  4.2.1),  since  the  primary  interest  was  to  assess  the 
potential  of  all  sensors  when  each  sensor  may  contribute  some  information. 

A  series  of  linear  regressions  was  run  for  liquid  water  content,  cloud 
thickness  and  height  and  water  vapor  and  the  results  are  given  in  Tables 

6.1.1  -  6.1.4.  It  can  be  seen  that  the  10.7  GHz  channel  is  a  significant 

2 

predictor  of  liquid  water  content  (R  =  99.6),  and  that  the  6.7  micron 
and  19  GHz  channels  have  some  predictive  power.  The  3.8  micron  channel  is 
the  most  significant  for  predicting  cloud  thickness  (R  =  76.2),  but  this 
parameter  cannot  be  determined  with  great  accuracy.  Cloud  top  height 
was  correlated  most  highly  with  the  10.5  micron  channel  intensity 
(R  =  93.5)  and  to  a  lesser  extent  with  the  94  GHz  brightness  temperature 
and  1.6  micron  channel  intensity.  Water  vapor  was  most  highly  correlated 
with  the  four  microwave  channel  brightness  temperatures  (R  =  88.5). 

Next,  the  effect  of  adding  noise  to  the  test  set  was  evaluated,  using 
the  same  linear  regression  models.  Noise  levels  of  0.1%  and  1%  rms  were 
used,  with  the  results  shown  in  Tables  6.1.5  -  6.1.8.  The  0.1%  noise  is 
seen  to  have  an  insignificant  effect  in  all  cases.  However,  the  1%  noise 
caused  significant  degradation  in  the  inversions  for  liquid  water  content 
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Table  6.1.1 

INVERSION  FOR  LWC  VIA  LINEAR 
REGRESSION  USING  VISIBLE,  INFRARED 
AND  MICROWAVE  DATA 


Order 

Variable 

Coefficient 

2 

R^ 

0 

0 

mean 

*1.017  x  102 

0 

136.48 

1 

T10 

1.764  x  101 

.996 

8.63 

2 

T19 

-1.369 

.998 

6.54 

3 

A3.8 

-9.965  x  102 

.999 

3.98 

4 

T37 

-1.256 

.999 

3.50 

5 

1 1 0 . 5 

3.639 x  101 

.999 

3.33 

6 

A2. 1 

1 .959  x  102 

.999 

2.97 

7 

1 1 2 . 5 

-2.304  x  101 

.999 

2.92 

3 

Intercept  Estimate  -1.452  x  10 


*Mean  value  is  given  for  reference  purposes  only. 
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Table  6.1.2 

INVERSION  FOR  CT  VIA  LINEAR 
REGRESSION  USING  VISIBLE,  INFRARED 
AND  MICROWAVE  DATA 


Order 

Variable 

Coefficient 

O 

R* 

D 

0 

mean 

*1.131  x  lO1 

0 

9.36 

1 

A3.8 

-1.154  x  103 

.762 

4.57 

2 

1 1 2 . 5 

3.248 x  101 

.799 

4.21 

3 

T94 

-4.754  x  10  1 

.818 

4.02 

4 

T37 

-1.063 

.825 

3.95 

5 

T1 9 

1.157 

.846 

3.73 

6 

*6.7 

-2.236 x 101 

.869 

3.44 

7 

T10 

-1 .230 

.883 

3.27 

8 

_ 

A1.0 

6.157  x  101 

.891 

_ 

3.15 

Intercept  Estimate  4.718 x  lo2 


*Mean  value  is  given  for  reference  purposes  only. 


Table  6.1.3 


INVERSION  FOR  CH  VIA  LINEAR 
REGRESSION  USING  VISIBLE,  INFRARED 
AND  MICROWAVE  DATA 


Order 

Variable 

Coefficient 

2 

R^ 

a 

0 

mean 

*3.783  x  101 

0 

14.26 

1 

*10.5 

-1.581  x  101 

-1.592  x  101 

.935 

3.67 

2 

A1.0 

.942 

3.47 

Intercept  estimate  1.231  ><10 
*Mean  value  is  given  for  reference  purposes  only 


Table  6.1.4 


INVERSION  FOR  WV  VIA  LINEAR 
REGRESSION  USING  VISIBLE,  INFRARED 
AND  MICROWAVE  DATA 


Order 


0 

1 

2 

3 

4 

5 


Variable 

Coefficient 

70 

ro 

- 1 

mean 

*0 

0 

T19 

1 . 988  x  10'1 

.092 

tiq 

-3.163  x  10"1 

.583 

T37 

-4.384  x  10‘2 

.876 

T94 

6.761  x  10‘3 

.885 

oo 

CO 

c 

- 

-4.625 

.893 

Intercept  estimate  1.740  x  10 


1 


*Mean  value  is  given  for  reference  purposes  only. 
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Table  6.1.5 

EFFECT  OF  MEASUREMENT  NOISE  ON 
INTEGRATED  LIQUID  WATER  REGRESSIONS 
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Table  6.1.7 

EFFECT  OF  MEASUREMENT  NOISE  ON 
CLOUD  HEIGHT  LINEAR  REGRESSIONS 

•  combined  measurements 

*  °model  =  0 
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and  water  v.  por.  There  was  only  a  small  degradation  in  the  cloud 
thickness  inversions,  and  the  inversions  for  cloud  height  were 
essentially  unaffected. 

6 . 2  Scenari o  for  Inference  of  Geophysical  Parameters  Using  Combined 
Visible,  Infrared  and  Microwave  Data 

A  possible  scenario  for  geophysical  parameter  inversion  is  given  in 
Figures  6.2.1  and  6.2.2.  In  Figure  6.2.1,  a  method  for  inversion  over  the 
ocean  surface  is  given.  The  first  step  is  separation  of  clouds  from  clear 
columns,  with  further  classification  of  cloud  type  into  either  water  cloud 
or  ice  cloud.  This  may  be  done  using  the  decision  tree  approach  given  in 
Secticn  V.  Following  this  step,  three  conditional  inverters  are  utilized. 
The  water  cloud  inverter  is  perhaps  the  most  complex  of  the  three;  for  this 
reason,  it  has  received  the  most  extensive  treatment  in  this  study  (cf.. 
Chapters  III,  IV).  The  possibility  of  inverting  for  cloud  top  height  and 
thickness,  cloud  water  content,  surface  wind  speed,  atmospheric  water  vapor 
and  rainfall  rate  has  been  studied  at  length  in  that  Chapter,  and  the 
reader  is  referred  to  that  discussion  for  further  information.  The  clear 
column  inverter  would  be  used  for  determination  of  wind  speed  and 
atmospheric  water  vapor,  using  all  channels,  in  the  same  manner  as  the 
water  cloud  inverter.  The  same  approach  could  be  used  for  the  ice  cloud 
inverter,  if  it  was  desired  to  estimate  the  continuous-valued  parameters 
in  that  case. 

The  situation  over  land  is  given  in  Figure  6.2.2  and  is  seen  to  be 
somewhat  more  complex.  The  first  step  is  a  classification  of  both  cloud 
and  surface  type.  Here,  a  total  of  four  categories  is  used: 


Observed  Radiances 


Figure  6.2.1  Inversion  Scenario  for  Inversion  Over  Ocean 


Observed  Radiances 


Figure  6 „ 2 0 2  Inversion  Scenario  for  Inversion  Over  Land 


Clouds 


Surface 


The  classification  may  be  accomplished  using  the  decision  tree  approach 
of  Chapter  V.  Extensive  results  are  given  there  to  illustrate  the 
structure  and  error  rates  of  such  trees.  Once  this  classification  has 
been  made,  the  next  step  is  to  invert  for  the  continuous-valued  parameters 
using  one  of  the  four  conditional  inverters  shown  in  the  figure.  These 
inverters  may  be  designed  using  the  methods  of  Chapter  II. 
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VII.  PLANS  FOR  TESTING  AND  VERIFICATION 

The  inversion  and  classification  schemes  developed  in  this  study 
were  constructed  through  the  use  of  simulated  observations.  For  this 
reason,  the  results  obtained  through  the  use  of  the  methodologies  are  in 
some  sense  hypothetical.  Although  every  effort  was  made  to  provide  real  is 
tic  simulations,  there  are  factors  such  as  the  spectral  response  of  the 
optical  filters  of  an  instrument  which  has  neither  been  built  nor  even 
specified,  that  are  impossible  or  expensive  to  simulate.  At  such  time 
as  observations  from  a  single  package  are  available,  however,  it  will  be 
advisable  to  implement  and  test  the  algorithms  to  determine  their  accuracy 

Two  major  classes  of  algorithms  were  presented  in  this  study.  The 
first  class  provided  estimates  of  continuous  valued  variables  such  as 
wind  speed,  cloud  height,  etc.  The  second  provides  a  means  of  scene 
classification.  One  common  characteristic  of  both  is  that  they  are  both 
area  averages. 

In  order  to  implement  the  algorithms  suggested  by  this  study,  several 
steps  will  be  necessary: 

1.  Specification  of  the  actual  instrument  parameters. 

2.  Creation  of  new  forward  models  for  those  frequencies/wavelengths 

or  look  angles  not  used  in  this  study  but  present  on  the  instrument. 

« 

3.  Creation  of  new  regression  estimators  to  account  for  any  dif¬ 
ferent  selection  of  frequencies/wavelengths  or  look  angles  in  the  in¬ 
strument  from  those  considered  in  this  study. 

4.  The  creation  and  implementation  of  a  data  handling  package  that 
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implements  the  algorithms. 

5.  The  construction  of  a  verification  dataset  that  is  matched  to 
the  satellite  observations  in  space/time. 

6.  Fine  tuning  of  forward  models  and  Friedman  trees  from  matched 

data. 

7.  Verification  and  error  analysis  of  results. 

Step  1  will  not  be  addressed  by  this  section.  Suggestions  have  been 
made  in  the  course  of  this  study  as  to  the  suitability  of  various  fre¬ 
quencies/wavelengths  for  the  inference  of  various  parameters.  The  ultimate 
choice  is  often  modulated  by  hardware  considerations  such  as  local 
oscillator  frequencies,  antenna  and  telescope  size  and  number,  etc. 

It  is  beyond  the  scope  of  this  study  to  address  these  problems. 

Steps  2  and  3  will  be  required  if  new  or  different  frequencies  are 
used  in  the  ultimate  instrument  or  if  the  algorithm  is  to  be  tested  in 
other  than  a  nadir  looking  mode.  The  methods  for  performing  steps  2  and 
3  are  given  in  Sections  III  and  IV  of  this  study.  Briefly,  they  will 
entail  the  generation  of  a  data  base  of  simulations  and  the  performance 
of  a  stepwise  regression  analysis  using  terms  of  the  form  identified 
in  this  study. 

Step  4  is  again  instrument  dependent  and  will  not  be  considered  in 
depth.  The  inversion  algorithms  developed  in  this  study  should  not 
present  any  difficulties  in  this  phase. 

Steps  5,  6  and  7  are  similar  in  nature  arid  will  be  considered  here 
in  conjunction  with  each  other.  Step  5  represents  the  collection  of 
data  to  drive  steps  6  and  7.  Step  5  represents  the  fine  tuning  of  the 
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models  to  reality.  The  importance  of  this  step  cannot  be  overemphasized. 
The  radiative  transfer  models  used  in  the  model  development,  while  rea¬ 
sonable  in  terms  of  the  time  and  resources  available  for  this  study,  do 
not  handle  the  entire  spectrum  of  naturally  occurring  clouds  and  surfaces 
Further,  the  instrument  itself  will  introduce  effects  such  as  scan  angle 
dependent  gains  and  biases  that  must  be  removed.  Thus,  it  is  important 
to  produce  a  set  of  observations  matched  with  known  meteorological  condi¬ 
tions.  The  expected  radiance  based  on  the  radiative  transfer  model  using 
the  known  conditions  may  be  compared  with  the  observed  radiance  and  syste 
matic  errors  removed.  Depending  on  the  error  source,  these  comparisons 
may  also  be  used  to  refine  absorption  coefficient  models.  Finally,  the 
noise  performance  of  the  total  system  may  be  evaluated. 

Once  the  calibration  of  step  6  has  been  performed,  the  actual 
performance  analysis  of  step  7  may  begin  and  the  error  performance 
achieved  may  be  compared  with  the  error  predicted  error  performance. 
Standard  error  analysis  that  produces  error  means  and  covariances  will 
be  sufficient,  although  it  will  be  of  interest  to  see  if  there  exist 
special  situations  that  degrade  the  performance  more  than  others. 

Since  the  data  set  of  step  5  is  instrumental  in  both  steps  6  and  7, 
it  will  now  be  considered  at  greater  length.  It  is  obviously  desirable 
that  the  data  set  consist  of  a  large  collection  of  "verified"  data.  It 
is  important,  however,  to  determine  the  source  and  accuracy  of  this  data. 
If  the  data  is  collected  in  a  biased  or  inaccurate  form,  the  calibration 
of  step  6  and  error  analysis  of  step  7  will  be  likewise  biased  or  in¬ 
accurate. 

The  most  fundamental  requirement  of  the  data  set  is  that  it  should 
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contain  actual  observations  of  the  state  of  the  observed  scene  that  are 
coincident  with  the  instrument  overflight  in  both  space  and  time.  This 
is  a  non-trivial  requirement.  The  rapidly  varying  nature  of  cloud 
parameters  in  both  space  and  time  suggest  that  to  be  useful,  the  "co¬ 
incident"  observation  must  be  within  an  hour  in  time  and  ^30  km  in 
space  to  be  useful.  During  periods  of  rapid  change  such  as  occur  during 
the  passage  of  a  front,  even  these  limits  may  be  too  large.  This  data 
requirement  will  imply  that  the  verification  data  will  of  necessity  be 
collected  independently  of  normal  synoptic  observations.  In  particular, 
gridded  analysis  data  will  be  of  little  use  as  it  will  be  too  "smooth" 
to  represent  the  actual  conditions. 

Turning  to  the  exact  nature  of  the  observations  required,  it  is 
possible  to  identify  two  general  classes.  For  the  purposes  of  forward 
model  calibration  and  error  analysis  in  the  continuous  parameter  scheme, 
a  complete  specification  of  the  atmospheric  state  is  required.  For  the 
purposes  of  the  verification  of  the  scene  classification  algorithm,  the 
correct  scene  class  need  only  be  known. 

Of  these  two  requirements,  the  scene  classification  verification  data 
is  the  easiest  to  acquire.  The  requirement  for  this  data  is  to  classify 
the  scene  into  one  of  four  categories: 

1.  No  Cloud,  No  Snow 

2.  No  Cloud,  Snow 

3.  Water  Cloud 

4.  Ice  Clouc 
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An  overflight  of  an  airplane  in  conjunction  with  an  instrument  orbit 
will  yield  a  large  amount  of  data.  The  parameters  that  need  to  be 
classified  may  be  recorded  by  simple  photographic  techniques  and  clas¬ 
sified  by  simple  human  subjective  judgement. 

The  requirements  for  verification  data  for  tuning  and  error  analysis 
of  the  continuous  variables  are  very  difficult  to  meet.  First,  a  large 
number  of  variables  must  be  recorded.  These  variables  are: 

1.  Temperature  profiles 

2.  Water  vapor  profiles 

3.  Surface  wind  speed 

4.  Surface  character  and  temperature 

5.  Cloud  profiles  in  terms  of  density,  altitude  and  phase 

6.  Instantaneous  rainfall  rate. 

Further,  these  variables  will  be  required  over  both  land  and  ocean  sur¬ 
faces.  The  requirement  for  ship  station  data  for  ocean  surface  observa¬ 
tions  will  present  (and  has  historically  presented)  a  large  problem  in 
the  implementation  of  the  data  collection  due  to  their  low  number.  In 
either  case,  the  data  collection  will  have  to  be  done  asynoptical ly  to 
coincide  with  the  instrument  passage. 

The  present  means  of  data  collection  for  the  atmospheric  state 
measurements  and  their  approximate  present  accuracies  are  given  in  Table 
7.1.  As  may  be  seen,  the  gas  parameter  measurements  such  as  temperature 
are  usually  of  higher  accuracy  than  the  particle  measurements  such  as 
the  iniegrated  liquid  water  content  of  a  cloud.  Unfortunately,  the 
importance  of  the  measurements  in  terms  of  the  verification  is  almost 
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in  reverse  order  of  their  accuracy  since  the  satellite  measurements  are 
intended  to  infer  particle  parameters.  It  is  an  open  question  as  to  the 
adequacy  of  these  present  measurement  techniques  to  provide  a  sufficient 
data  set  at  reasonable  cost.  Hopefully,  advances  in  the  state  of  the  art 
in  lidar,  acoustic  radar  and  in  situ  aircraft  measurements  will  allow 
verification  on  a  more  accurate  and  less  costly  basis. 

In  summary,  a  plan  for  implementation  and  verification  has  been  out¬ 
lined.  Seven  basic  steps  or  phases  have  been  outlined.  Of  these  steps, 
the  most  difficult  and  expensive  task  will  be  the  collection  of  a  suf¬ 
ficiently  large  and  accurate  data  set  for  the  tuning  and  verification 
of  continuous  parameters.  This  difficulty  is  caused  by  both  the  general 
lack  of  ship  stations  and  the  difficulty  in  the  measurements  of  particle 
parameters,  Some  advance  in  the  state  of  the  art  in  particle  parameter 
measurements  seems  desirable. 
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TABLE  7.1 


Parameter 

Primary 

Measurement 

Instrumentation 

Estimated 

Accuracy 

Temperature 

Profile 

Radiosonde  borne 
thermistor 

1°K 

Humidty 

Profile 

Radiosonde  borne 
hydristor 

±5-10%  of 
true  humidity 

Average  Surface 

Wind  Speed 

Cup  anemometer 

1-2  m/sec 
depending  on 
variabil ity 

Cloud  Base 

In  situ  aircraft 
radar 

1  idar 

acoustic  radar 
human  estimate 

50m- 500m 
depending  on 
thickness, 
altitude  and 
variabil ity 
within  cloud 

Cloud  Top 

In  situ  aircraft 
radar 

1  idar 

acoustic  radar 

50m- 500m 
dpending  on 
thickness, 
altitude  and 
variability 
within  cloud 

Rainfall  Rate 

In  situ  aircraft 
radar 

1  idar 

rain  gauge 

factor  of  2 
of  true  value 

Integrated 

Liquid  Water 
within  Cloud 

In  situ  aircraft 
radar 

1  idar 

factor  of  2 
of  true  val ue 

Surface 

Temperature 

Thermistor 

1°K 

depending  on 

surface 

character 
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VIII.  CONCLUSIONS  AND  RECOMMENDATIONS 

The  numerical  experiments  conducted  during  this  study  for  estimation 
of  continuous-valued  parameters  have  produced  several  significant  results, 
predicated  on  the  assumed  data  base.  These  are  summarized  in  Table  8.1  and 
in  the  following: 

(1)  Integrated  liquid  water  content  within  clouds  appears  resolvable 

2 

using  microwave  data  to  within  about  0.02  gm/cm  with  no  rainfall. 

? 

With  rainfall,  the  error  increases  to  about  0.1  gm/cm  .  The  1.6 
and  3.8  micron  channels  can  also  be  used  in  the  absence  of  micro- 
wave  data. 

(2)  Ocean  surface  wind  speed  can  be  estimated  to  within  about  3  m/sec 
using  the  19  and  37  GHz  channels. 

(3)  Systematic  atmospheric  temperature  deviations  from  an  assumed 
profile  cannot  be  estimated  to  any  significant  degree  from  micro- 
wave  data. 

(4)  Single-layer  cloud  thickness  can  be  determined  to  within  about 
600  meters  from  microwave  data.  Principal  discriminator  is  the 
37  GHz  channel,  with  smaller  contributions  from  the  19  and  94  GHz 
channels.  The  12.5  and  3.8  micron  channels  are  also  significant 
predictors,  both  with  and  without  coincident  microwave  data. 

(5)  Cloud  top  height  can  be  determined  to  within  about  900  meters 
using  the  given  microwave  channels.  The  most  significant  predictor 
is  the  94  GHz  channel:  however,  the  19  and  37  GHz  channels  also 
contribute.  The  10.5  micron  channel  contributes  significantly  to 
cloud  height  determination,  and  a  contribution  is  also  made  from 


TABLE  8.1 


Summary  of  Predictors  for 
Continuous-Valued  Parameters 


XX  significant 
X  moderate 
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the  1.0  micron  channel.  When  visible,  infrared  and  microwave 
channels  are  used  in  combination,  the  10.5  and  1.0  micron  chan¬ 
nels  are  significant  predictors. 

(6)  Atmospheric  water  vapor  cannot  be  estimated  to  any  great  degree 

of  accuracy  from  the  given  microwave  channels.  The  error  stand- 

2 

ard  deviation  could  be  reduced  to  only  about  0.3  g/cm  during  the 
experiments  conducted  in  this  study.  The  visible  and  infrared 
channels  do  not  contribute. 

(7)  Rainfall  rate  can  be  determined  to  within  about  3  mm/hr.  The 
principal  predictor  is  the  19  GHz  channel,  with  contributions 
also  made  by  the  10.7  GHz  channel. 

The  discrete-valued  parameter  section  of  this  report  lias  produced  a 
number  of  results.  The  first  and  most  important  is  that  it  is  feasible  to 
separate  clear  columns  from  water  clouds  and  from  even  thin  or  low  ice 
clouds.  This  is  important  both  in  terms  of  performing  the  conditional 
inversion  schemes  of  earlier  sections  and  in  determining  basic  desired 
meteorological  parameters. 

A  second  important  result  of  this  study  has  been  the  background  class¬ 
ification  algorithms  that  have  been  developed.  These  experiments  show  that 
it  is  possible  to  discriminate  clear  columns  over  a  loam  background  and 
clear  columns  over  a  snow  background  from  each  other  and  various  cloud 
conditions  with  a  very  low  error  rate. 

A  final  result  of  the  discrete  values  parameter  portion  of  this  effort 
was  that  no  single  channel  or  spectral  interval  was  of  overriding  importance. 
All  intervals  specified  were  seen  to  participate  at  some  point  in  the  decision 


trees  formulated. 
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In  addition  to  the  numerical  results,  several  detailed  comparisons 
of  methodologies  have  been  performed.  The  overall  result  is  that  the 
Bayesian  methods  are  preferred  over  regression  techniques  due  to  their 
desirable  robust  properties  in  the  presence  of  noise  and  modeling  errors. 
Much  more  work  needs  to  be  performed  to  develop  the  most  efficient  tech¬ 
niques  and  to  explore  the  extension  to  real-time  dynamic  parameter  tracking. 
However,  it  is  felt  that  the  results  of  this  study  demonstrate  that 
Bayesian  techniques  should  be  seriously  considered  for  implementation  into 
the  3DNEPH  program  to  extend  its  capability  for  inference  of  meteorological 
parameters. 
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APPENDIX  A 

COMPARISON  OF  INVERSION  METHODS 

It  is  instructive  to  compare  the  various  inversion  techniques 
explored  in  this  study  by  means  of  a  simple  numerical  example.  We 
will  consider  here  the  following  methods: 

(1)  linear  regression 

(2)  extended  Kalman  Filter 

(3)  iterated  extended  Kalman  Filter 

(4)  modal  estimator 

The  n-dimensional  state  x  to  be  estimated  is  assumed  to  be  normally 

distributed  with  mean  m  and  covariance  matrix  P  .  Ine  p-dimensional 

o 

measurement  y  is  nonlinear  in  x: 

y  =  h(x)  +  v  (A. 1 ) 

where  v  is  zero-mean  additive  noise,  normally-distributed  with  covariance 
matrix  V.  The  conditional  probability  density  of  x,  given  y,  is: 

p(x|y)  =  c  exp  j -  1  (x-m)T  P"1  (x-m)  -  ^  (y-h(x))T  V"1  (y-h(x))j  (A. 2) 

where  c  is  a  normalization  constant. 

The  estimators  are  given  as  follows: 

(1)  Linear  Regression 

x^  =  a  +  By  (A. 3) 

where  the  n*  1  vector  a  and  the  nxp  matrix  B  are  found  by  minimizing 


E[U-x)T  U-x)] 
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with  respect  to  a,  B  and  E(*)  denotes  expectation  with  respect  to  the 
underlying  probability  density  or,  in  some  cases,  is  an  arithmetic 
average  over  an  ensemble  of  data. 


(2)  Extended  Kalman  Filter 

XEK  =  1,1  +  K[y-h(m)] 

K  =  PQ  HT  [HPqHT  +  V]'1 

where  H  = 

*x  x=m 


(A. 4) 
(A. 5) 

(A. 6) 


(A. 12) 


-266- 


(4)  Modal  Estimator 

At  the  mode  of  p(y|x)  we  have,  from  (A. 2): 

^  j^(x-m)T  PQ  (x-m)  +  (y-h(x))TV_1  (y-h(x))J  „  =  0  (A. 13) 

x  Xm 


This  becomes,  after  simplifying: 


=  m  +  P,X  V  (y-h(x  ) ) 
m  o  m  m 


„here  H  -  ^ 
m  8x 


(A. 14) 
(A. 15) 


x=x. 


For  nonlinear  h( • ) ,  (A. 14)  may  not  be  solvable  directly  and  one  may  have 
to  rely  on  iterative  methods. 

We  will  now  consider  the  following  scalar  example: 
h(x)  =  x2,  x  =  2 

V  =  1  ,  V  =  1 

m  "  1  •  P0  =  1 


Thus  .y  =  5.  The  four  methods  yield  the  following  results. 


(1)  Linear  Regression 
xL  =  a  +  by 


2  m  P„ 

u  —  0 

b - y-j - ** 

(2P  +  m  ;  +  V  -  2Pf 

O  0 

1 

=  4 

2 

a  =  m  -  b(PQ  +  m  ) 

1 

2 


where 
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Thus 


xL  =  1.75 


(2)  Extended  Kalman  Filter 

K  =  2 
K  5 


XEK  1  +  5  ^ 


=  2.6 


(3)  Iterated  Extended  Kalman  Filter 


,0)  . 


1.  M 


(D  .  2 


e(2)  =  1  +  |{  5  -  1  +  2(1-1  )| 


=  2.6 


M<2>  =  _ 203) 


T 


5(4(13/5)c  +  l) 
=  .185 


6(3)  =  1  +  .185  |  5  -  (13/5)2  +  (26/5) (13/5  -  l)j 


=  2.2136 
=  .215 


=  2.18 
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(4)  Modal  Estimator 

xm  =  2.175 
m 


However,  successive  approximation  won't  work  here,  nor  will  a 
gradient  method  starting  with  m=l  as  the  initial  estimate.  A  gradient 


method  starting  from  x^K  will,  however,  work.  This  extreme  sensitivity 
was  found  to  be  a  problem  in  multivariate  cloud  parameter  determination 
from  microwave  data.  Hence,  the  modal  estimator  is  not  recommended  as  a 
viable  method  at  this  time.  Further  study  is  required  to  find  robust 


methods  of  finding  xm  and  evaluating  its  accuracy  relative  to  the  other 
methods. 
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APPENDIX  B 

CRAMER- RAO  BOUND  FOR 
ESTIMATION  ERRORS 

The  Cramer-Rao  bound  provides  an  easily-computed  performance  limit 
on  the  accuracy  with  which  parameters  can  be  estimated  from  data. 

The  estimation  problem  under  consideration  is  the  following.  The 
parameters  are  represented  by  the  nxl  vector  x  which  is  Gaussian  distri¬ 
buted  with  mean  m  and  covariance  matrix  P  .  Information  about  x  is 

o 

obtained  from  a  single  measurement  of  the  form 

y  =  h(x)  +  v  (B.l ) 

where  h(*)  is  a  known  linear  or  nonlinear  function  and  exists  over 

the  domain  of  x,  fi  .  The  measurement  noise  v  is  a  zero-mean  Gaussian 

random  variable  with  covariance  matrix  V,  and  is  independent  of  x. 

The  information  available  to  estimate  x  is  contained  in  y  and  the 
parameters  m,  P  ,  V.  Thus,  we  may  write  the  estimate  of  x  as 

x  =  g(y;m,P0,V)  (B.2) 

The  objective  is  to  find  g(*;»)  such  that  some  measure  of  the  estimation 

A 

error,  e  =  x-x,  is  minimized.  Ue  choose  the  quadratic  form 

C  =  E[eTe]  (B.3) 

where  E  denotes.  exDectation  over  all  underlying  random  variables.  In 
practice,  we  have  used  ensemble  averages  over  the  available  data  in  the 
data  base. 
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The  determination  of  g(  ;  )  to  minimize  C  is  straightforward  if 
h(-)  is  linear  and  yields  the  Kalman  Filter  of  Section  III.  However, 
difficulties  arise  if  h(*)  is  nonlinear.  In  fact,  no  general  solution 
to  the  problem  exists  at  the  present  time  and  aporoximations  must  almost 
inevitably  be  made.  Rather  than  search  for  "good"  nonlinear  solutions 
it  is  more  useful  in  many  cases  to  investigate  performance  bounds.  A 
particularly  useful  tool  is  the  Cramer-Rao  bound,  which  can  be  easily 
computed  in  this  case,  and  yields  a  matrix  R  such  that 

R  <_  E[e  eT]*  (B.4) 

for  arbitrary  g(y,m,PQ,V) .  Furthermore,  equality  holds  if  and  only  if 
h(-)  is  linear.  Notice  that  this  implies  the  optimality  of  the  Kalman 
Filter  for  linear  measurements. 

We  now  turn  to  computation  of  R. 

A. 1  General  Results 

Let  D(x|y)  be  the  density  function  of  x,  conditioned  on  y  and  define 
H ( x | y )  =  log  p(x|y)  (B.5) 


Let  s  be  an  n x  l  vector  with  elements 


s 

si  3xi 


Then  the  Fisher  information  matrix  is 


J  =  E[s  s'] 


(B.6) 


(B.7) 


■m •  ■  <  i negua 1 i ty  A  <  B  means  that 


uT  A  u 


uT  B  u 


for  all  real  u. 
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Then,  following  VanTrees  (1969),  the  lower  bound  is  of  the  form 

R  =  J'1  (B.8) 

if  the  estimate  is  unbiased;  i.e. 

E[g(y;m,PQ,V)  -  x]  =  0  (B.9) 

which  is  a  desirable  property  of  an  estimator  in  practice. 

A. 2  Evaluation  of  J  for  Gaussian  Random  Variables 
For  x,  v  Gaussian,  we  have 

logp(x|y)  =  const  -  j  (x-m)T  P^1  (x-m)  -  ^  (y-h(x))T  V"1  (y-h(x))  (B.10) 

from  which 

s  =  hj  V"1  (y-h(x))  -  P^1  (x-m)  (B.ll) 

where  h  =  3h/ax  and  [ah/ax] . .  =  ah. (x)/ax. 

X  *  J  •  J 

Thus,  setting  ax  =  x  -  m  and  using  y-h(x)  =  v,  we  have 

J  =  E  [hj  V-1v  -  P~0]  ax]  [vT  V'1  hx  -  AxT  P'1]  (B.  12) 

Using  E(axaxT)  =  PQ 
E(v  vT)  =  V 
E(vaxT)  =  0 
in  (B. 12)  gives 

j  -  p;'  ,  e(i7  v-’  nx> 
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Thus,  for  an^  unbiased  estimator 
x  =  g(y;m,Po,V) 

E[(x-x)(x-x)T]  >  J-1 
which  is  the  desired  result. 

Note,  in  particular,  that  the  variance  of  error  for  the  ith  parameter 
is  bounded  by 

var(x^)  =  E[(xi  -  xi)2] 

>  [J"1].. 

This  result  is  used  in  Section  III  to  provide  performance  bounds  using 
different  data  channels  and  different  noise  magnitudes. 


